A new maintenance RcppSMC release 0.2.6 arrived at CRAN yesterday. It chiefly updates the code to comply with g++-11 which default to C++17 which brings us std::data(). And if one is not careful, as we weren t in three files, this can clash with other uses of data as I tweeted a good week ago. Otherwise some JSS URLs now sport the preferred shorter doi form. RcppSMC provides Rcpp-based bindings to R for the Sequential Monte Carlo Template Classes (SMCTC) by Adam Johansen described in his JSS article. Sequential Monte Carlo is also referred to as Particle Filter in some contexts. The package features the Google Summer of Code work by Leah South in 2017, and by Ilya Zarubin in 2021. This release is summarized below.

Changes in RcppSMC version 0.2.6 (2021-12-17)

Updated URLs to JSS for the new DOI scheme upon their request

Adjusted three source files for C++17 compilation under g++-11

Courtesy of my CRANberries, there is a diffstat report for this release. More information is on the RcppSMC page. Issues and bugreports should go to the GitHub issue tracker. If you like this or other open-source work I do, you can now sponsor me at GitHub.

This post by Dirk Eddelbuettel originated on his Thinking inside the box blog. Please report excessive re-aggregation in third-party for-profit settings.

After recovering from my latest email crash (previously, previously), I had to figure out which tool I should be using. I had many options but I figured I would start with a popular one (mbsync). But I also evaluated OfflineIMAP which was resurrected from the Python 2 apocalypse, and because I had used it before, for a long time. Read on for the details.

Benchmark setup All programs were tested against a Dovecot 1:2.3.13+dfsg1-2 server, running Debian bullseye. The client is a Purism 13v4 laptop with a Samsung SSD 970 EVO 1TB NVMe drive. The server is a custom build with a AMD Ryzen 5 2600 CPU, and a RAID-1 array made of two NVMe drives (Intel SSDPEKNW010T8 and WDC WDS100T2B0C). The mail spool I am testing against has almost 400k messages and takes 13GB of disk space:

$ notmuch count --exclude=false
372758
$ du -sh --exclude xapian Maildir
13G Maildir

The baseline we are comparing against is SMD (syncmaildir) which performs the sync in about 7-8 seconds locally (3.5 seconds for each push/pull command) and about 10-12 seconds remotely. Anything close to that or better is good enough. I do not have recent numbers for a SMD full sync baseline, but the setup documentation mentions 20 minutes for a full sync. That was a few years ago, and the spool has obviously grown since then, so that is not a reliable baseline. A baseline for a full sync might be also set with rsync, which copies files at nearly 40MB/s, or 317Mb/s!

anarcat@angela:tmp(main)$ time rsync -a --info=progress2 --exclude xapian  shell.anarc.at:Maildir/ Maildir/
 12,647,814,731 100%   37.85MB/s    0:05:18 (xfr#394981, to-chk=0/395815)    
72.38user 106.10system 5:19.59elapsed 55%CPU (0avgtext+0avgdata 15988maxresident)k
8816inputs+26305112outputs (0major+50953minor)pagefaults 0swaps

That is 5 minutes to transfer the entire spool. Incremental syncs are obviously pretty fast too:

anarcat@angela:tmp(main)$ time rsync -a --info=progress2 --exclude xapian  shell.anarc.at:Maildir/ Maildir/
              0   0%    0.00kB/s    0:00:00 (xfr#0, to-chk=0/395815)    
1.42user 0.81system 0:03.31elapsed 67%CPU (0avgtext+0avgdata 14100maxresident)k
120inputs+0outputs (3major+12709minor)pagefaults 0swaps

As an extra curiosity, here's the performance with tar, pretty similar with rsync, minus incremental which I cannot be bothered to figure out right now:

anarcat@angela:tmp(main)$ time ssh shell.anarc.at tar --exclude xapian -cf - Maildir/   pv -s 13G   tar xf - 
56.68user 58.86system 5:17.08elapsed 36%CPU (0avgtext+0avgdata 8764maxresident)k
0inputs+0outputs (0major+7266minor)pagefaults 0swaps
12,1GiO 0:05:17 [39,0MiB/s] [===================================================================> ] 92%

Interesting that rsync manages to almost beat a plain tar on file transfer, I'm actually surprised by how well it performs here, considering there are many little files to transfer. (But then again, this maybe is exactly where rsync shines: while tar needs to glue all those little files together, rsync can just directly talk to the other side and tell it to do live changes. Something to look at in another article maybe?) Since both ends are NVMe drives, those should easily saturate a gigabit link. And in fact, a backup of the server mail spool achieves much faster transfer rate on disks:

anarcat@marcos:~$ tar fc - Maildir   pv -s 13G > Maildir.tar
15,0GiO 0:01:57 [ 131MiB/s] [===================================] 115%

That's 131Mibyyte per second, vastly faster than the gigabit link. The client has similar performance:

anarcat@angela:~(main)$ tar fc - Maildir   pv -s 17G > Maildir.tar
16,2GiO 0:02:22 [ 116MiB/s] [==================================] 95%

So those disks should be able to saturate a gigabit link, and they are not the bottleneck on fast links. Which begs the question of what is blocking performance of a similar transfer over the gigabit link, but that's another question altogether, because no sync program ever reaches the above performance anyways. Finally, note that when I migrated to SMD, I wrote a small performance comparison that could be interesting here. It show SMD to be faster than OfflineIMAP, but not as much as we see here. In fact, it looks like OfflineIMAP slowed down significantly since then (May 2018), but this could be due to my larger mail spool as well.

mbsync The isync (AKA `mbsync`) project is written in C and supports syncing Maildir and IMAP folders, with possibly multiple replicas. I haven't tested this but I suspect it might be possible to sync between two IMAP servers as well. It supports partial mirorrs, message flags, full folder support, and "trash" functionality.

Complex configuration file I started with this .mbsyncrc configuration file:

SyncState *
Sync New ReNew Flags
IMAPAccount anarcat
Host imap.anarc.at
User anarcat
PassCmd "pass imap.anarc.at"
SSLType IMAPS
CertificateFile /etc/ssl/certs/ca-certificates.crt
IMAPStore anarcat-remote
Account anarcat
MaildirStore anarcat-local
# Maildir/top/sub/sub
#SubFolders Verbatim
# Maildir/.top.sub.sub
SubFolders Maildir++
# Maildir/top/.sub/.sub
# SubFolders legacy
# The trailing "/" is important
#Path ~/Maildir-mbsync/
Inbox ~/Maildir-mbsync/
Channel anarcat
# AKA Far, convert when all clients are 1.4+
Master :anarcat-remote:
# AKA Near
Slave :anarcat-local:
# Exclude everything under the internal [Gmail] folder, except the interesting folders
#Patterns * ![Gmail]* "[Gmail]/Sent Mail" "[Gmail]/Starred" "[Gmail]/All Mail"
# Or include everything
Patterns *
# Automatically create missing mailboxes, both locally and on the server
#Create Both
Create slave
# Sync the movement of messages between folders and deletions, add after making sure the sync works
#Expunge Both

Long gone are the days where I would spend a long time reading a manual page to figure out the meaning of every option. If that's your thing, you might like this one. But I'm more of a "EXAMPLES section" kind of person now, and I somehow couldn't find a sample file on the website. I started from the Arch wiki one but it's actually not great because it's made for Gmail (which is not a usual Dovecot server). So a sample config file in the manpage would be a great addition. Thankfully, the Debian packages ships one in /usr/share/doc/isync/examples/mbsyncrc.sample but I only found that after I wrote my configuration. It was still useful and I recommend people take a look if they want to understand the syntax. Also, that syntax is a little overly complicated. For example, Far needs colons, like:

Far :anarcat-remote:

Why? That seems just too complicated. I also found that sections are not clearly identified: IMAPAccount and Channel mark section beginnings, for example, which is not at all obvious until you learn about mbsync's internals. There are also weird ordering issues: the SyncState option needs to be before IMAPAccount, presumably because it's global. Using a more standard format like .INI or TOML could improve that situation.

Stellar performance A transfer of the entire mail spool takes 56 minutes and 6 seconds, which is impressive. It's not quite "line rate": the resulting mail spool was 12GB (which is a problem, see below), which turns out to be about 29Mbit/s and therefore not maxing the gigabit link, and an order of magnitude slower than rsync. The incremental runs are roughly 2 seconds, which is even more impressive, as that's actually faster than rsync:

===> multitime results
1: mbsync -a
            Mean        Std.Dev.    Min         Median      Max
real        2.015       0.052       1.930       2.029       2.105       
user        0.660       0.040       0.592       0.661       0.722       
sys         0.338       0.033       0.268       0.341       0.387

Those tests were performed with isync 1.3.0-2.2 on Debian bullseye. Tests with a newer isync release originally failed because of a corrupted message that triggered bug 999804 (see below). Running 1.4.3 under valgrind works around the bug, but adds a 50% performance cost, the full sync running in 1h35m. Once the upstream patch is applied, performance with 1.4.3 is fairly similar, considering that the new sync included the register folder with 4000 messages:

120.74user 213.19system 59:47.69elapsed 9%CPU (0avgtext+0avgdata 105420maxresident)k
29128inputs+28284376outputs (0major+45711minor)pagefaults 0swaps

That is ~13GB in ~60 minutes, which gives us 28.3Mbps. Incrementals are also pretty similar to 1.3.x, again considering the double-connect cost:

===> multitime results
1: mbsync -a
            Mean        Std.Dev.    Min         Median      Max
real        2.500       0.087       2.340       2.491       2.629       
user        0.718       0.037       0.679       0.711       0.793       
sys         0.322       0.024       0.284       0.320       0.365

Those tests were all done on a Gigabit link, but what happens on a slower link? My server uplink is slow: 25 Mbps down, 6 Mbps up. There mbsync is worse than the SMD baseline:

===> multitime results
1: mbsync -a
Mean        Std.Dev.    Min         Median      Max
real        31.531      0.724       30.764      31.271      33.100      
user        1.858       0.125       1.721       1.818       2.131       
sys         0.610       0.063       0.506       0.600       0.695

That's 30 seconds for a sync, which is an order of magnitude slower than SMD.

Great user interface Compared to OfflineIMAP and (ahem) SMD, the mbsync UI is kind of neat:

anarcat@angela:~(main)$ mbsync -a
Notice: Master/Slave are deprecated; use Far/Near instead.
C: 1/2  B: 204/205  F: +0/0 *0/0 #0/0  N: +1/200 *0/0 #0/0

(Note that nice switch away from slavery-related terms too.) The display is minimal, and yet informative. It's not obvious what does mean at first glance, but the manpage is useful at least for clarifying that:

This represents the cumulative progress over channels, boxes, and messages affected on the far and near side, respectively. The message counts represent added messages, messages with updated flags, and trashed messages, respectively. No attempt is made to calculate the totals in advance, so they grow over time as more information is gathered. (Emphasis mine).

In other words:

C 2/2: channels done/total (2 done out of 2)
B 204/205: mailboxes done/total (204 out of 205)
F: changes on the far side
N: +10/200 *0/0 #0/0: changes on the "near" side:
- +10/200: 10 out of 200 messages downloaded
- *0/0: no flag changed
- #0/0: no message deleted

You get used to it, in a good way. It does not, unfortunately, show up when you run it in systemd, which is a bit annoying as I like to see a summary mail traffic in the logs.

Interoperability issue In my notmuch setup, I have bound key S to "mark spam", which basically assigns the tag spam to the message and removes a bunch of others. Then I have a notmuch-purge script which moves that message to the spam folder, for training purposes. It basically does this:

notmuch search --output=files --format=text0 "$search_spam" \
      xargs -r -0 mv -t "$HOME/Maildir/$ PREFIX junk/cur/"

This method, which worked fine in SMD (and also OfflineIMAP) created this error on sync:

Maildir error: duplicate UID 37578.

And indeed, there are now two messages with that UID in the mailbox:

anarcat@angela:~(main)$ find Maildir/.junk/ -name '*U=37578*'
Maildir/.junk/cur/1637427889.134334_2.angela,U=37578:2,S
Maildir/.junk/cur/1637348602.2492889_221804.angela,U=37578:2,S

This is actually a known limitation or, as mbsync(1) calls it, a "RECOMMENDATION":

When using the more efficient default UID mapping scheme, it is important that the MUA renames files when moving them between Maildir fold ers. Mutt always does that, while mu4e needs to be configured to do it:
(setq mu4e-change-filenames-when-moving t)

So it seems I would need to fix my script. It's unclear how the paths should be renamed, which is unfortunate, because I would need to change my script to adapt to mbsync, but I can't tell how just from reading the above. (A manual fix is actually to rename the file to remove the U= field: mbsync will generate a new one and then sync correctly.) Fortunately, someone else already fixed that issue: afew, a notmuch tagging script (much puns, such hurt), has a move mode that can rename files correctly, specifically designed to deal with mbsync. I had already been told about afew, but it's one more reason to standardize my notmuch hooks on that project, it looks like. Update: I have tried to use afew and found it has significant performance issues. It also has a completely different paradigm to what I am used to: it assumes all incoming mail has a new and lays its own tags on top of that (inbox, sent, etc). It can only move files from one folder at a time (see this bug) which breaks my spam training workflow. In general, I sync my tags into folders (e.g. ham, spam, sent) and message flags (e.g. inbox is F, unread is "not S", etc), and afew is not well suited for this (although there are hacks that try to fix this). I have worked hard to make my tagging scripts idempotent, and it's something afew doesn't currently have. Still, it would be better to have that code in Python than bash, so maybe I should consider my options here.

Stability issues The newer release in Debian bookworm (currently at 1.4.3) has stability issues on full sync. I filed bug 999804 in Debian about this, which lead to a thread on the upstream mailing list. I have found at least three distinct crashes that could be double-free bugs "which might be exploitable in the worst case", not a reassuring prospect. The thing is: `mbsync` is really fast, but the downside of that is that it's written in C, and with that comes a whole set of security issues. The Debian security tracker has only three CVEs on isync, but the above issues show there could be many more. Reading the source code certainly did not make me very comfortable with trusting it with untrusted data. I considered sandboxing it with systemd (below) but having systemd run as a `--user` process makes that difficult. I also considered using an apparmor profile but that is not trivial because we need to allow SSH and only some parts of it... Thankfully, upstream has been diligent at addressing the issues I have found. They provided a patch within a few days which did fix the sync issues. Update: upstream actually took the issue very seriously. They not only got CVE-2021-44143 assigned for my bug report, they also audited the code and found several more issues collectively identified as CVE-2021-3657, which actually also affect 1.3 (ie. Debian 11/bullseye/stable). Somehow my corpus doesn't trigger that issue, but it was still considered serious enough to warrant a CVE. So one the one hand: excellent response from upstream; but on the other hand: how many more of those could there be in there?

Automation with systemd The Arch wiki has instructions on how to setup mbsync as a systemd service. It suggests using the --verbose (-V) flag which is a little intense here, as it outputs 1444 lines of messages. I have used the following .service file:

[Unit]
Description=Mailbox synchronization service
ConditionHost=!marcos
Wants=network-online.target
After=network-online.target
Before=notmuch-new.service
[Service]
Type=oneshot
ExecStart=/usr/bin/mbsync -a
Nice=10
IOSchedulingClass=idle
NoNewPrivileges=true
[Install]
WantedBy=default.target

And the following .timer:

[Unit]
Description=Mailbox synchronization timer
ConditionHost=!marcos
[Timer]
OnBootSec=2m
OnUnitActiveSec=5m
Unit=mbsync.service
[Install]
WantedBy=timers.target

Note that we trigger notmuch through systemd, with the Before and also by adding mbsync.service to the notmuch-new.service file:

[Unit]
Description=notmuch new
After=mbsync.service
[Service]
Type=oneshot
Nice=10
ExecStart=/usr/bin/notmuch new
[Install]
WantedBy=mbsync.service

An improvement over polling repeatedly with a .timer would be to wake up only on IMAP notify, but neither imapnotify nor goimapnotify seem to be packaged in Debian. It would also not cover for the "sent folder" use case, where we need to wake up on local changes.

Password-less setup The sample file suggests this should work:

IMAPStore remote
Tunnel "ssh -q host.remote.com /usr/sbin/imapd"

Add BatchMode, restrict to IdentitiesOnly, provide a password-less key just for this, add compression (-C), find the Dovecot imap binary, and you get this:

IMAPAccount anarcat-tunnel
Tunnel "ssh -o BatchMode=yes -o IdentitiesOnly=yes -i ~/.ssh/id_ed25519_mbsync -o HostKeyAlias=shell.anarc.at -C anarcat@imap.anarc.at /usr/lib/dovecot/imap"

And it actually seems to work:

$ mbsync -a
Notice: Master/Slave are deprecated; use Far/Near instead.
C: 0/2  B: 0/1  F: +0/0 *0/0 #0/0  N: +0/0 *0/0 #0/0imap(anarcat): Error: net_connect_unix(/run/dovecot/stats-writer) failed: Permission denied
C: 2/2  B: 205/205  F: +0/0 *0/0 #0/0  N: +1/1 *3/3 #0/0imap(anarcat)<1611280><90uUOuyElmEQlhgAFjQyWQ>: Info: Logged out in=10808 out=15396642 deleted=0 expunged=0 trashed=0 hdr_count=0 hdr_bytes=0 body_count=1 body_bytes=8087

It's a bit noisy, however. dovecot/imap doesn't have a "usage" to speak of, but even the source code doesn't hint at a way to disable that Error message, so that's unfortunate. That socket is owned by root:dovecot so presumably Dovecot runs the imap process as $user:dovecot, which we can't do here. Oh well? Interestingly, the SSH setup is not faster than IMAP. With IMAP:

===> multitime results
1: mbsync -a
            Mean        Std.Dev.    Min         Median      Max
real        2.367       0.065       2.220       2.376       2.458       
user        0.793       0.047       0.731       0.776       0.871       
sys         0.426       0.040       0.364       0.434       0.476

With SSH:

===> multitime results
1: mbsync -a
            Mean        Std.Dev.    Min         Median      Max
real        2.515       0.088       2.274       2.532       2.594       
user        0.753       0.043       0.645       0.766       0.804       
sys         0.328       0.045       0.212       0.340       0.393

Basically: 200ms slower. Tolerable.

Migrating from SMD The above was how I migrated to mbsync on my first workstation. The work on the second one was more streamlined, especially since the corruption on mailboxes was fixed:

install isync, with the patch:
```
dpkg -i isync_1.4.3-1.1~_amd64.deb
```
copy all files over from previous workstation to avoid a full resync (optional):
```
rsync -a --info=progress2 angela:Maildir/ Maildir-mbsync/
```

rename all files to match new hostname (optional):

find Maildir-mbsync/ -type f -name '*.angela,*' -print0    rename -0 's/\.angela,/\.curie,/'

trash the notmuch database (optional):
```
rm -rf Maildir-mbsync/.notmuch/xapian/
```

disable all smd and notmuch services:

systemctl --user --now disable smd-pull.service smd-pull.timer smd-push.service smd-push.timer notmuch-new.service notmuch-new.timer

do one last sync with smd:

smd-pull --show-tags ; smd-push --show-tags ; notmuch new ; notmuch-sync-flagged -v

backup notmuch on the client and server:
```
notmuch dump   pv > notmuch.dump
```
backup the maildir on the client and server:
```
cp -al Maildir Maildir-bak
```

create the SSH key:

ssh-keygen -t ed25519 -f .ssh/id_ed25519_mbsync
cat .ssh/id_ed25519_mbsync.pub

add to .ssh/authorized_keys on the server, like this: command="/usr/lib/dovecot/imap",restrict ssh-ed25519 AAAAC...
move old files aside, if present:
```
mv Maildir Maildir-smd
```
move new files in place (CRITICAL SECTION BEGINS!):
```
mv Maildir-mbsync Maildir
```
run a test sync, only pulling changes: mbsync --create-near --remove-none --expunge-none --noop anarcat-register
if that works well, try with all mailboxes: mbsync --create-near --remove-none --expunge-none --noop -a
if that works well, try again with a full sync: mbsync register mbsync -a
reindex and restore the notmuch database, this should take ~25 minutes:
```
notmuch new
pv notmuch.dump   notmuch restore
```
enable the systemd services and retire the smd-* services: systemctl --user enable mbsync.timer notmuch-new.service systemctl --user start mbsync.timer rm ~/.config/systemd/user/smd* systemctl daemon-reload

During the migration, notmuch helpfully told me the full list of those lost messages:

[...]
Warning: cannot apply tags to missing message: CAN6gO7_QgCaiDFvpG3AXHi6fW12qaN286+2a7ERQ2CQtzjSEPw@mail.gmail.com
Warning: cannot apply tags to missing message: CAPTU9Wmp0yAmaxO+qo8CegzRQZhCP853TWQ_Ne-YF94MDUZ+Dw@mail.gmail.com
Warning: cannot apply tags to missing message: F5086003-2917-4659-B7D2-66C62FCD4128@gmail.com
[...]
Warning: cannot apply tags to missing message: mailman.2.1316793601.53477.sage-members@mailman.sage.org
Warning: cannot apply tags to missing message: mailman.7.1317646801.26891.outages-discussion@outages.org
Warning: cannot apply tags to missing message: notmuch-sha1-000458df6e48d4857187a000d643ac971deeef47
Warning: cannot apply tags to missing message: notmuch-sha1-0079d8e0c3340e6f88c66f4c49fca758ea71d06d
Warning: cannot apply tags to missing message: notmuch-sha1-0194baa4cfb6d39bc9e4d8c049adaccaa777467d
Warning: cannot apply tags to missing message: notmuch-sha1-02aede494fc3f9e9f060cfd7c044d6d724ad287c
Warning: cannot apply tags to missing message: notmuch-sha1-06606c625d3b3445420e737afd9a245ae66e5562
Warning: cannot apply tags to missing message: notmuch-sha1-0747b020f7551415b9bf5059c58e0a637ba53b13
[...]

As detailed in the crash report, all of those were actually innocuous and could be ignored. Also note that we completely trash the notmuch database because it's actually faster to reindex from scratch than let notmuch slowly figure out that all mails are new and all the old mails are gone. The fresh indexing took:

nov 19 15:08:54 angela notmuch[2521117]: Processed 384679 total files in 23m 41s (270 files/sec.).
nov 19 15:08:54 angela notmuch[2521117]: Added 372610 new messages to the database.

While a reindexing on top of an existing database was going twice as slow, at about 120 files/sec.

Current config file Putting it all together, I ended up with the following configuration file:

SyncState *
Sync All
# IMAP side, AKA "Far"
IMAPAccount anarcat-imap
Host imap.anarc.at
User anarcat
PassCmd "pass imap.anarc.at"
SSLType IMAPS
CertificateFile /etc/ssl/certs/ca-certificates.crt
IMAPAccount anarcat-tunnel
Tunnel "ssh -o BatchMode=yes -o IdentitiesOnly=yes -i ~/.ssh/id_ed25519_mbsync -o HostKeyAlias=shell.anarc.at -C anarcat@imap.anarc.at /usr/lib/dovecot/imap"
IMAPStore anarcat-remote
Account anarcat-tunnel
# Maildir side, AKA "Near"
MaildirStore anarcat-local
# Maildir/top/sub/sub
#SubFolders Verbatim
# Maildir/.top.sub.sub
SubFolders Maildir++
# Maildir/top/.sub/.sub
# SubFolders legacy
# The trailing "/" is important
#Path ~/Maildir-mbsync/
Inbox ~/Maildir/
# what binds Maildir and IMAP
Channel anarcat
Far :anarcat-remote:
Near :anarcat-local:
# Exclude everything under the internal [Gmail] folder, except the interesting folders
#Patterns * ![Gmail]* "[Gmail]/Sent Mail" "[Gmail]/Starred" "[Gmail]/All Mail"
# Or include everything
#Patterns *
Patterns * !register  !.register
# Automatically create missing mailboxes, both locally and on the server
Create Both
#Create Near
# Sync the movement of messages between folders and deletions, add after making sure the sync works
Expunge Both
# Propagate mailbox deletion
Remove both
IMAPAccount anarcat-register-imap
Host imap.anarc.at
User register
PassCmd "pass imap.anarc.at-register"
SSLType IMAPS
CertificateFile /etc/ssl/certs/ca-certificates.crt
IMAPAccount anarcat-register-tunnel
Tunnel "ssh -o BatchMode=yes -o IdentitiesOnly=yes -i ~/.ssh/id_ed25519_mbsync -o HostKeyAlias=shell.anarc.at -C register@imap.anarc.at /usr/lib/dovecot/imap"
IMAPStore anarcat-register-remote
Account anarcat-register-tunnel
MaildirStore anarcat-register-local
SubFolders Maildir++
Inbox ~/Maildir/.register/
Channel anarcat-register
Far :anarcat-register-remote:
Near :anarcat-register-local:
Create Both
Expunge Both
Remove both

Note that it may be out of sync with my live (and private) configuration file, as I do not publish my "dotfiles" repository publicly for security reasons.

OfflineIMAP I've used OfflineIMAP for a long time before switching to SMD. I don't exactly remember why or when I started using it, but I do remember it became painfully slow as I started using `notmuch`, and would sometimes crash mysteriously. It's been a while, so my memory is hazy on that. It also kind of died in a fire when Python 2 stop being maintained. The main author moved on to a different project, imapfw which could serve as a framework to build IMAP clients, but never seemed to implement all of the OfflineIMAP features and certainly not configuration file compatibility. Thankfully, a new team of volunteers ported OfflineIMAP to Python 3 and we can now test that new version to see if it is an improvement over `mbsync`.

Crash on full sync The first thing that happened on a full sync is this crash:

Copy message from RemoteAnarcat:junk:
 ERROR: Copying message 30624 [acc: Anarcat]
  decoding with 'X-EUC-TW' codec failed (AttributeError: 'memoryview' object has no attribute 'decode')
Thread 'Copy message from RemoteAnarcat:junk' terminated with exception:
Traceback (most recent call last):
  File "/usr/share/offlineimap3/offlineimap/imaputil.py", line 406, in utf7m_decode
    for c in binary.decode():
AttributeError: 'memoryview' object has no attribute 'decode'
The above exception was the direct cause of the following exception:
Traceback (most recent call last):
  File "/usr/share/offlineimap3/offlineimap/threadutil.py", line 146, in run
    Thread.run(self)
  File "/usr/lib/python3.9/threading.py", line 892, in run
    self._target(*self._args, **self._kwargs)
  File "/usr/share/offlineimap3/offlineimap/folder/Base.py", line 802, in copymessageto
    message = self.getmessage(uid)
  File "/usr/share/offlineimap3/offlineimap/folder/IMAP.py", line 342, in getmessage
    data = self._fetch_from_imap(str(uid), self.retrycount)
  File "/usr/share/offlineimap3/offlineimap/folder/IMAP.py", line 908, in _fetch_from_imap
    ndata1 = self.parser['8bit-RFC'].parsebytes(data[0][1])
  File "/usr/lib/python3.9/email/parser.py", line 123, in parsebytes
    return self.parser.parsestr(text, headersonly)
  File "/usr/lib/python3.9/email/parser.py", line 67, in parsestr
    return self.parse(StringIO(text), headersonly=headersonly)
  File "/usr/lib/python3.9/email/parser.py", line 56, in parse
    feedparser.feed(data)
  File "/usr/lib/python3.9/email/feedparser.py", line 176, in feed
    self._call_parse()
  File "/usr/lib/python3.9/email/feedparser.py", line 180, in _call_parse
    self._parse()
  File "/usr/lib/python3.9/email/feedparser.py", line 385, in _parsegen
    for retval in self._parsegen():
  File "/usr/lib/python3.9/email/feedparser.py", line 298, in _parsegen
    for retval in self._parsegen():
  File "/usr/lib/python3.9/email/feedparser.py", line 385, in _parsegen
    for retval in self._parsegen():
  File "/usr/lib/python3.9/email/feedparser.py", line 256, in _parsegen
    if self._cur.get_content_type() == 'message/delivery-status':
  File "/usr/lib/python3.9/email/message.py", line 578, in get_content_type
    value = self.get('content-type', missing)
  File "/usr/lib/python3.9/email/message.py", line 471, in get
    return self.policy.header_fetch_parse(k, v)
  File "/usr/lib/python3.9/email/policy.py", line 163, in header_fetch_parse
    return self.header_factory(name, value)
  File "/usr/lib/python3.9/email/headerregistry.py", line 601, in __call__
    return self[name](name, value)
  File "/usr/lib/python3.9/email/headerregistry.py", line 196, in __new__
    cls.parse(value, kwds)
  File "/usr/lib/python3.9/email/headerregistry.py", line 445, in parse
    kwds['parse_tree'] = parse_tree = cls.value_parser(value)
  File "/usr/lib/python3.9/email/_header_value_parser.py", line 2675, in parse_content_type_header
    ctype.append(parse_mime_parameters(value[1:]))
  File "/usr/lib/python3.9/email/_header_value_parser.py", line 2569, in parse_mime_parameters
    token, value = get_parameter(value)
  File "/usr/lib/python3.9/email/_header_value_parser.py", line 2492, in get_parameter
    token, value = get_value(value)
  File "/usr/lib/python3.9/email/_header_value_parser.py", line 2403, in get_value
    token, value = get_quoted_string(value)
  File "/usr/lib/python3.9/email/_header_value_parser.py", line 1294, in get_quoted_string
    token, value = get_bare_quoted_string(value)
  File "/usr/lib/python3.9/email/_header_value_parser.py", line 1223, in get_bare_quoted_string
    token, value = get_encoded_word(value)
  File "/usr/lib/python3.9/email/_header_value_parser.py", line 1064, in get_encoded_word
    text, charset, lang, defects = _ew.decode('=?' + tok + '?=')
  File "/usr/lib/python3.9/email/_encoded_words.py", line 181, in decode
    string = bstring.decode(charset)
AttributeError: decoding with 'X-EUC-TW' codec failed (AttributeError: 'memoryview' object has no attribute 'decode')
Last 1 debug messages logged for Copy message from RemoteAnarcat:junk prior to exception:
thread: Register new thread 'Copy message from RemoteAnarcat:junk' (account 'Anarcat')
ERROR: Exceptions occurred during the run!
ERROR: Copying message 30624 [acc: Anarcat]
  decoding with 'X-EUC-TW' codec failed (AttributeError: 'memoryview' object has no attribute 'decode')
Traceback:
  File "/usr/share/offlineimap3/offlineimap/folder/Base.py", line 802, in copymessageto
    message = self.getmessage(uid)
  File "/usr/share/offlineimap3/offlineimap/folder/IMAP.py", line 342, in getmessage
    data = self._fetch_from_imap(str(uid), self.retrycount)
  File "/usr/share/offlineimap3/offlineimap/folder/IMAP.py", line 908, in _fetch_from_imap
    ndata1 = self.parser['8bit-RFC'].parsebytes(data[0][1])
  File "/usr/lib/python3.9/email/parser.py", line 123, in parsebytes
    return self.parser.parsestr(text, headersonly)
  File "/usr/lib/python3.9/email/parser.py", line 67, in parsestr
    return self.parse(StringIO(text), headersonly=headersonly)
  File "/usr/lib/python3.9/email/parser.py", line 56, in parse
    feedparser.feed(data)
  File "/usr/lib/python3.9/email/feedparser.py", line 176, in feed
    self._call_parse()
  File "/usr/lib/python3.9/email/feedparser.py", line 180, in _call_parse
    self._parse()
  File "/usr/lib/python3.9/email/feedparser.py", line 385, in _parsegen
    for retval in self._parsegen():
  File "/usr/lib/python3.9/email/feedparser.py", line 298, in _parsegen
    for retval in self._parsegen():
  File "/usr/lib/python3.9/email/feedparser.py", line 385, in _parsegen
    for retval in self._parsegen():
  File "/usr/lib/python3.9/email/feedparser.py", line 256, in _parsegen
    if self._cur.get_content_type() == 'message/delivery-status':
  File "/usr/lib/python3.9/email/message.py", line 578, in get_content_type
    value = self.get('content-type', missing)
  File "/usr/lib/python3.9/email/message.py", line 471, in get
    return self.policy.header_fetch_parse(k, v)
  File "/usr/lib/python3.9/email/policy.py", line 163, in header_fetch_parse
    return self.header_factory(name, value)
  File "/usr/lib/python3.9/email/headerregistry.py", line 601, in __call__
    return self[name](name, value)
  File "/usr/lib/python3.9/email/headerregistry.py", line 196, in __new__
    cls.parse(value, kwds)
  File "/usr/lib/python3.9/email/headerregistry.py", line 445, in parse
    kwds['parse_tree'] = parse_tree = cls.value_parser(value)
  File "/usr/lib/python3.9/email/_header_value_parser.py", line 2675, in parse_content_type_header
    ctype.append(parse_mime_parameters(value[1:]))
  File "/usr/lib/python3.9/email/_header_value_parser.py", line 2569, in parse_mime_parameters
    token, value = get_parameter(value)
  File "/usr/lib/python3.9/email/_header_value_parser.py", line 2492, in get_parameter
    token, value = get_value(value)
  File "/usr/lib/python3.9/email/_header_value_parser.py", line 2403, in get_value
    token, value = get_quoted_string(value)
  File "/usr/lib/python3.9/email/_header_value_parser.py", line 1294, in get_quoted_string
    token, value = get_bare_quoted_string(value)
  File "/usr/lib/python3.9/email/_header_value_parser.py", line 1223, in get_bare_quoted_string
    token, value = get_encoded_word(value)
  File "/usr/lib/python3.9/email/_header_value_parser.py", line 1064, in get_encoded_word
    text, charset, lang, defects = _ew.decode('=?' + tok + '?=')
  File "/usr/lib/python3.9/email/_encoded_words.py", line 181, in decode
    string = bstring.decode(charset)
Folder junk [acc: Anarcat]:
 Copy message UID 30626 (29008/49310) RemoteAnarcat:junk -> LocalAnarcat:junk
Command exited with non-zero status 100
5252.91user 535.86system 3:21:00elapsed 47%CPU (0avgtext+0avgdata 846304maxresident)k
96344inputs+26563792outputs (1189major+2155815minor)pagefaults 0swaps

That only transferred about 8GB of mail, which gives us a transfer rate of 5.3Mbit/s, more than 5 times slower than mbsync. This bug is possibly limited to the bullseye version of offlineimap3 (the lovely 0.0~git20210225.1e7ef9e+dfsg-4), while the current sid version (the equally gorgeous 0.0~git20211018.e64c254+dfsg-1) seems unaffected.

Tolerable performance The new release still crashes, except it does so at the very end, which is an improvement, since the mails do get transferred:

 *** Finished account 'Anarcat' in 511:12
ERROR: Exceptions occurred during the run!
ERROR: Exception parsing message with ID (<20190619152034.BFB8810E07A@marcos.anarc.at>) from imaplib (response type: bytes).
 AttributeError: decoding with 'X-EUC-TW' codec failed (AttributeError: 'memoryview' object has no attribute 'decode')
Traceback:
  File "/usr/share/offlineimap3/offlineimap/folder/Base.py", line 810, in copymessageto
    message = self.getmessage(uid)
  File "/usr/share/offlineimap3/offlineimap/folder/IMAP.py", line 343, in getmessage
    data = self._fetch_from_imap(str(uid), self.retrycount)
  File "/usr/share/offlineimap3/offlineimap/folder/IMAP.py", line 910, in _fetch_from_imap
    raise OfflineImapError(
ERROR: Exception parsing message with ID (<40A270DB.9090609@alternatives.ca>) from imaplib (response type: bytes).
 AttributeError: decoding with 'x-mac-roman' codec failed (AttributeError: 'memoryview' object has no attribute 'decode')
Traceback:
  File "/usr/share/offlineimap3/offlineimap/folder/Base.py", line 810, in copymessageto
    message = self.getmessage(uid)
  File "/usr/share/offlineimap3/offlineimap/folder/IMAP.py", line 343, in getmessage
    data = self._fetch_from_imap(str(uid), self.retrycount)
  File "/usr/share/offlineimap3/offlineimap/folder/IMAP.py", line 910, in _fetch_from_imap
    raise OfflineImapError(
ERROR: IMAP server 'RemoteAnarcat' does not have a message with UID '32686'
Traceback:
  File "/usr/share/offlineimap3/offlineimap/folder/Base.py", line 810, in copymessageto
    message = self.getmessage(uid)
  File "/usr/share/offlineimap3/offlineimap/folder/IMAP.py", line 343, in getmessage
    data = self._fetch_from_imap(str(uid), self.retrycount)
  File "/usr/share/offlineimap3/offlineimap/folder/IMAP.py", line 889, in _fetch_from_imap
    raise OfflineImapError(reason, severity)
Command exited with non-zero status 1
8273.52user 983.80system 8:31:12elapsed 30%CPU (0avgtext+0avgdata 841936maxresident)k
56376inputs+43247608outputs (811major+4972914minor)pagefaults 0swaps
"offlineimap  -o " took 8 hours 31 mins 15 secs

This is 8h31m for transferring 12G, which is around 3.1Mbit/s. That is nine times slower than mbsync, almost an order of magnitude! Now that we have a full sync, we can test incremental synchronization. That is also much slower:

===> multitime results
1: sh -c "offlineimap -o   true"
            Mean        Std.Dev.    Min         Median      Max
real        24.639      0.513       23.946      24.526      25.708      
user        23.912      0.473       23.404      23.795      24.947      
sys         1.743       0.105       1.607       1.729       2.002

That is also an order of magnitude slower than mbsync, and significantly slower than what you'd expect from a sync process. ~30 seconds is long enough to make me impatient and distracted; 3 seconds, less so: I can wait and see the results almost immediately.

Integrity check That said: this is still on a gigabit link. It's technically possible that OfflineIMAP performs better than mbsync over a slow link, but I Haven't tested that theory. The OfflineIMAP mail spool is missing quite a few messages as well:

anarcat@angela:~(main)$ find Maildir-offlineimap -type f -type f -a \! -name '.*'   wc -l 
381463
anarcat@angela:~(main)$ find Maildir -type f -type f -a \! -name '.*'   wc -l 
385247

... although that's probably all either new messages or the register folder, so OfflineIMAP might actually be in a better position there. But digging in more, it seems like the actual per-folder diff is fairly similar to mbsync: a few messages missing here and there. Considering OfflineIMAP's instability and poor performance, I have not looked any deeper in those discrepancies.

Other projects to evaluate Those are all the options I have considered, in alphabetical order

doveadm-sync: requires dovecot on both ends, can tunnel over SSH, may have performance issues in incremental sync, written in C

fdm: fetchmail replacement, IMAP/POP3/stdin/Maildir/mbox,NNTP support, SOCKS support (for Tor), complex rules for delivering to specific mailboxes, adding headers, piping to commands, etc. discarded because no (real) support for keeping mail on the server, and written in C

getmail: fetchmail replacement, IMAP/POP3 support, supports incremental runs, classification rules, Python

interimap: syncs two IMAP servers, apparently faster than `doveadm` and `offlineimap`, but requires running an IMAP server locally, Perl

isync/mbsync: TLS client certs and SSH tunnels, fast, incremental, IMAP/POP/Maildir support, multiple mailbox, trash and recursion support, and generally has good words from multiple Debian and notmuch people (Arch tutorial), written in C, review above

mail-sync: notify support, happens over any piped transport (e.g. ssh), diff/patch system, requires binary on both ends, mentions UUCP in the manpage, mentions `rsmtp` which is a nice name for `rsendmail`. not evaluated because it seems awfully complex to setup, Haskell

nncp: treat the local spool as another mail server, not really compatible with my "multiple clients" setup, Golang

offlineimap3: requires IMAP, used the py2 version in the past, might just still work, first sync painful (IIRC), ways to tunnel over SSH, review above, Python

Most projects were not evaluated due to lack of time.

Conclusion I'm now using `mbsync` to sync my mail. I'm a little disappointed by the synchronisation times over the slow link, but I guess that's on par for the course if we use IMAP. We are bound by the network speed much more than with custom protocols. I'm also worried about the C implementation and the crashes I have witnessed, but I am encouraged by the fast upstream response. Time will tell if I will stick with that setup. I'm certainly curious about the promises of interimap and mail-sync, but I have ran out of time on this project.

A week after the 0.2.5 release bringing the recent Google Summer of Code for RcppSMC to CRAN, we have a minor bug-fix release consistently, essentially, of one line. Everybody s favourite OS and toolchain did not know what to make of pow(), and I seemingly failed to test there, so shame on me. But now all is good thanks to proper use of std::pow(). RcppSMC provides Rcpp-based bindings to R for the Sequential Monte Carlo Template Classes (SMCTC) by Adam Johansen described in his JSS article. Sequential Monte Carlo is also referred to as Particle Filter in some contexts. The package now features the Google Summer of Code work by Leah South in 2017, and by Ilya Zarubin in 2021. This release is summarized below.

Changes in RcppSMC version 0.2.5 (2021-09-09)

Compilation under Solaris is aided via std::pow use (Dirk in #65 fixing #64)

This post by Dirk Eddelbuettel originated on his Thinking inside the box blog. Please report excessive re-aggregation in third-party for-profit settings.

A brand new release 0.2.4 of the RcppSMC package arrived on CRAN earlier today, with a dual delay for CRAN closing for a well-earned break, and then being overwhelmed when reopening. Other than that the processing was again versy smooth. RcppSMC provides Rcpp-based bindings to R for the Sequential Monte Carlo Template Classes (SMCTC) by Adam Johansen described in his JSS article. Sequential Monte Carlo is also referred to as Particle Filter in some contexts. The package started when I put some Rcpp bindings together based on Adam s paper and library. It grew when Adam and I supervised Leah South during the 2017 iteration of the Google Summer of Code. And now it grew again as we have Adam, Leah and myself looking over the shoulders of Ilya Zarubin who did very fine work during the 2021 iteration of the Google Summer of Code that just concluded! So we are now GSoC squared! This release is effectively all work by Ilya and summarized below.

Changes in RcppSMC version 0.2.4 (2021-09-01)

Multiple Sequential Monte Carlo extensions (Ilya Zarubin as part of Google Summer of Code 2021)

Provide informative user output (convergence diagnostics) for PMMH example #50 (Ilya in #50 and #52 addressing #25, bullet point 5)

Support for tracking of ancestral lines for base sampler class (Ilya in #56)

Support for conditional SMC via derived conditionalSampler class (Ilya in #60)

Add URL and BugReports to DESCRIPTION (Dirk in #53)

This post by Dirk Eddelbuettel originated on his Thinking inside the box blog. Please report excessive re-aggregation in third-party for-profit settings.

The Debian Janitor is an automated system that commits fixes for (minor) issues in Debian packages that can be fixed by software. It gradually started proposing merges in early December. The first set of changes sent out ran lintian-brush on sid packages maintained in Git. This post is part of a series about the progress of the Janitor. Linux distributions like Debian fulfill an important function in the FOSS ecosystem - they are system integrators that take existing free and open source software projects and adapt them where necessary to work well together. They also make it possible for users to install more software in an easy and consistent way and with some degree of quality control and review. One of the consequences of this model is that the distribution package often lags behind upstream releases. This is especially true for distributions that have tighter integration and standardization (such as Debian), and often new upstream code is only imported irregularly because it is a manual process - both updating the package, but also making sure that it still works together well with the rest of the system. The process of importing a new upstream used to be (well, back when I started working on Debian packages) fairly manual and something like this:

Go to the upstream s homepage, find the tarball and signature and verify the tarball
Make modifications so the tarball matches Debian s format
Diff the original and new upstream tarballs and figure out whether changes are reasonable and which require packaging changes
Update the packaging, changelog, build and manually test the package
Upload

Ecosystem Improvements However, there have been developments over the last decade that make it easier to import new upstream releases into Debian packages.

Uscan and debian QA watch Uscan and debian/watch have been around for a while and make it possible to find upstream tarballs. A debian watch file usually looks something like this:

1 2	version=4 http://somesite.com/dir/filenamewithversion.tar.gz

The QA watch service regularly polls all watch locations in the archive and makes the information available, so it s possible to know which packages have changed without downloading each one of them.

Git Git is fairly ubiquitous nowadays, and most upstream projects and packages in Debian use it. There are still exceptions that do not use any version control system or that use a different control system, but they are becoming increasingly rare. [1]

debian/upstream/metadata DEP-12 specifies a file format with metadata about the upstream project that a package was based on. In particular relevant for our case is the fact it has fields for the location of the upstream version control location. debian/upstream/metadata files look something like this:

1
2
3

---
Repository: https://www.dulwich.io/code/dulwich/
Repository-Browse: https://www.dulwich.io/code/dulwich/

While DEP-12 is still a draft, it has already been widely adopted - there are about 10000 packages in Debian that ship a debian/upstream/metadata file with Repository information.

Autopkgtest The Autopkgtest standard and associated tooling provide a way to run a defined set of tests against an installed package. This makes it possible to verify that a package is working correctly as part of the system as a whole. ci.debian.net regularly runs these tests against Debian packages to detect regressions.

Vcs-Git headers The Vcs-Git headers in debian/control are the equivalent of the Repository field in debian/upstream/metadata, but for the packaging repositories (as opposed to the upstream ones). They ve been around for a while and are widely adopted, as can be seen from zack s stats:

The vcswatch service that regularly polls packaging repositories to see whether they have changed makes it a lot easier to consume this information in usable way.

Debhelper adoption Over the last couple of years, Debian has slowly been converging on a single build tool - debhelper s dh interface. Being able to rely on a single build tool makes it easier to write code to update packaging when upstream changes require it.

Debhelper DWIM Debhelper (and its helpers) increasingly can figure out how to do the Right Thing in many cases without being explicitly configured. This makes packaging less effort, but also means that it s less likely that importing a new upstream version will require updates to the packaging. With all of these improvements in place, it actually becomes feasible in a lot of situations to update a Debian package to a new upstream version automatically. Of course, this requires that all of this information is available, so it won t work for all packages. In some cases, the packaging for the older upstream version might not apply to the newer upstream version. The Janitor has attempted to import a new upstream Git snapshot and a new upstream release for every package in the archive where a debian/watch file or debian/upstream/metadata file are present. These are the steps it uses:

Find new upstream version
- If release, use debian/watch - or maybe tagged in upstream repository
- If snapshot, use debian/upstream/metadata s Repository field
- If neither is available, use guess-upstream-metadata from upstream-ontologist to guess the upstream Repository
Merge upstream version into packaging repository, possibly importing tarballs using pristine-tar
Update the changelog file to mention the new upstream version
Run some checks to ensure there are no unintentional changes, e.g.:
- Scan diff between old and new for surprising license changes
  
  Today, abort if there are any - in the future, maybe update debian/copyright
- Check for obvious compatibility breaks - e.g. sonames changing
Attempt to update the packaging to reflect upstream changes
- Refresh patches
Attempt to build the package with deb-fix-build, to deal with any missing dependencies
Run the autopkgtests with deb-fix-build to deal with missing dependencies, and abort if any tests fail

Results When run over all packages in unstable (sid), this process works for a surprising number of them.

Fresh Releases For fresh-releases (aka imports of upstream releases), processing all packages maintained in Git for which QA watch reports new releases (about 11,000):

That means about 2300 packages updated, and about 4000 unchanged.

Fresh Snapshots For fresh-snapshots (aka imports of latest Git commit from upstream), processing all packages maintained in Git (about 26,000):

Or 5100 packages updated and 2100 for which there was nothing to do, i.e. no upstream commits since the last Debian upload. As can be seen, this works for a surprising fraction of packages. It s possible to get the numbers up even higher, by both improving the tooling, the autopkgtests and the metadata that is provided by packages.

Using these packages All the packages that have been built can be accessed from the Janitor APT repository. More information can be found at https://janitor.debian.net/fresh, but in short - run:

echo deb "[arch=amd64 signed-by=/usr/share/keyrings/debian-janitor-archive-keyring.gpg]" \
    https://janitor.debian.net/ fresh-snapshots main   sudo tee /etc/apt/sources.list.d/fresh-snapshots.list
echo deb "[arch=amd64 signed-by=/usr/share/keyrings/debian-janitor-archive-keyring.gpg]" \
    https://janitor.debian.net/ fresh-releases main   sudo tee /etc/apt/sources.list.d/fresh-releases.list
sudo curl -o /usr/share/keyrings/debian-janitor-archive-keyring.gpg https://janitor.debian.net/pgp_keys
apt update

And then you can install packages from the fresh-snapshots (upstream git snapshots) or fresh-releases suites on a case-by-case basis by running something like:

1	apt install -t fresh-snapshots r-cran-roxygen2

Most packages are updated based on information provided by vcswatch and qa watch, but it s also possible for upstream repositories to call a web hook to trigger a refresh of a package. These packages were built against unstable, but should in almost all cases also work for testing.

Caveats Of course, since these packages are built automatically without human supervision it s likely that some of them will have bugs in them that would otherwise have been caught by the maintainer.

[1]	I m not saying that a monoculture is great here, but it does help distributions.

If I have seen further, it is by standing on the shoulders of Giants Issac Newton, 1675. Although it should be credited to 12th century Bernard of Chartres. You will know why I have shared this, probably at the beginning of Civil Aviation history itself.

Comments on the BBI court case which happened in Kenya, then and the subsequent appeal. I am not going to share much about the coverage of the BBI appeal as Gautam Bhatia has shared quite eloquently his observations, both on the initial case and the subsequent appeal which lasted 5 days in Kenya and was shown all around the world thanks to YouTube. One of the interesting points which stuck with me was that in Kenya, sign language is one of the official languages. And in fact, I was able to read quite a bit about the various sign languages which are there in Kenya. It just boggles the mind that there are countries that also give importance to such even though they are not as rich or as developed as we call developed economies. I probably might give more space and give more depth as it does carry some important judicial jurisprudence which is and which will be felt around the world. How does India react or doesn t is probably another matter altogether But yes, it needs it own space, maybe after some more time. Report on Standing Committee on IP Regulation in India and the false promises. Again, I do not want to take much time in sharing details about what the report contains, as the report can be found here. I have uploaded it on WordPress, in case of an issue. An observation on the same subject can be found here. At least, to me and probably those who have been following the IP space as either using/working on free software or even IP would be aware that the issues shared have been known since 1994. And it does benefit the industry rather than the country. This way, the rent-seekers, and monopolists win. There is ample literature that shared how rich countries had weak regulation for decades and even centuries till it was advantageous for them to have strong IP. One can look at the history of Europe and the United States for it. We can also look at the history of our neighbor China, which for the last 5 decades has used some provision of IP and disregarded many others. But these words are of no use, as the policies done and shared are by the rich for the rich.

Fighting between two State Borders Ironically or because of it, two BJP ruled states Assam and Mizoram fought between themselves. In which 6 policemen died. While the history of the two states is complicated it becomes a bit more complicated when one goes back into Assam and ULFA history and comes to know that ULFA could not have become that powerful until and unless, the Marwaris, people of my clan had not given generous donations to them. They thought it was a good investment, which later would turn out to be untrue. Those who think ULFA has declined, or whatever, still don t have answers to this or this. Interestingly, both the Chief Ministers approached the Home Minister (Mr. Amit Shah) of BJP. Mr. Shah was supposed to be the Chanakya but in many instances, including this one, he decided to stay away. His statement was on the lines of you guys figure it out yourself. There is a poem that was shared by the late poet Rahat Indori. I am sharing the same below as an image and will attempt to put a rough translation.
kisi ke baap ka hindustan todi hain Rahat Indori
Poets, whether in India or elsewhere, are known to speak truth to power and are a bit of a rebel. This poem by Rahat Indori is provocatively titled Kisi ke baap ka Hindustan todi hai , It challenges the majoritarian idea that Hindustan/India only belongs to the majoritarian religion. He also challenges as well as asserts at the same time that every Indian citizen, regardless of whatever his or her religion might be, is an Indian and can assert India as his home. While the whole poem is compelling in itself, for me what hits home is in the second stanza

:Lagegi Aag to aayege ghat kayi zad me, Yaha pe sirf hamara makan todi hai The meaning is simple yet subtle, he uses Aag or Fire as a symbol of hate sharing that if hate spreads, it won t be his home alone that will be torched. If one wants to literally understand what he meant, I present to you the cult Russian movie No Escapes or Ogon as it is known in Russian. If one were to decipher why the Russian film doesn t talk about climate change, one has to view it from the prism of what their leader Vladimir Putin has said and done over the years. As can be seen even in there, the situation is far more complex than one imagines. Although, it is interesting to note that he decried Climate change as man-made till as late as last year and was on the side of Trump throughout his presidency. This was in 2017 as well as perhaps this. Interestingly, there was a change in tenor and note just a couple of weeks back, but that could be only politicking or much more. Statements that are not backed by legislation and application are usually just a whitewash. We would have to wait to see what concrete steps are taken by Putin, Kremlin, and their Duma before saying either way.

Civil Aviation and the broad structure Civil Aviation is a large topic and I would not be able to do justice to it all in one article/blog post. So, for e.g. I will not be getting into Aircraft (Boeing, Airbus, Comac etc., etc.) or the new electric aircraft as that will just make the blog post long. I will not be also talking about cargo or Visa or many such topics, as all of them actually would and do need their own space. So this would be much more limited to Airports and to some extent airlines, as one cannot survive without the other. The primary reason for doing this is there is and has been a lot of myth-making in India about Civil Aviation in general, whether it has to do with Civil Aviation history or whatever passes as of policy in India.

A little early history Man has always looked at the stars and envisaged himself or herself as a bird, flying with gay abandon. In fact, there have been many paintings, sculptors who imagined how we would fly. The Steam Engine itself was invented in 82 BCE. But the attempt to fly was done by a certain Monk called Brother Elmer of Malmesbury who attempted the same in 1010., shortly after the birth of the rudimentary steam engine The most famous of all would be Leonardo da Vinci for his amazing sketches of flying machines in 1493. There were a couple of books by Cyrano de Bergerac, apparently wrote two books, both sadly published after his death. Interestingly, you can find both the book and the gentleman in the Project Gutenberg archives. How much of M/s Cyrano s exploits were his own and how much embellished by M/S Curtis, maybe a friend, a lover who knows, but it does give the air of the swashbuckling adventurer of the time which many men aspired to in that time. So, why not an author???

L Autre Monde: ou les tats et Empires de la Lune (Comical History of the States and Empires of the Moon) and Les tats et Empires du Soleil (The States and Empires of the Sun). These two French books apparently had a lot of references to flying machines. Both of them were authored by Cyrano de Bergerac. Both of these were sadly published after his death, one apparently in 1656 and the other one a couple of years later. By the 17th century, while it had become easy to know and measure the latitude, measuring longitude was a problem. In fact, it can be argued and probably successfully that India wouldn t have been under British rule or UK wouldn t have been a naval superpower if it hadn t solved the longitudinal problem. Over the years, the British Royal Navy suffered many blows, one of the most famous or infamous among them might be the Scilly naval disaster of 1707 which led to the death of 2000 odd British Royal naval personnel and led to Queen Anne, who was ruling over England at that time via Parliament and called it the Longitude Act which basically was an open competition for anybody to fix the problem and carried the prize money of 20,000. While nobody could claim the whole prize, many did get smaller amounts depending upon the achievements. The best and the nearest who came was John Harrison who made the first sea-watch and with modifications, over the years it became miniaturized to a pocket-sized Marine chronometer although, I doubt the ones used today look anything in those days. But if that had not been invented, we surely would have been freed long ago. The assumption being that the East India Company would have dashed onto rocks so many times, that the whole exercise would have been futile. The downside of it is that maritime trade routes that are being used today and the commerce would not have been. Neither would have aircraft or space for that matter, or at the very least delayed by how many years or decades, nobody knows. If one wants to read about the Longitudinal problem, one can get the famous book Longitude .

In many mythologies, including Indian and Arabian tales, in which we had the flying carpet which would let its passengers go from one place to the next. Then there is also mention of Pushpak Vimana in ancient texts, but those secrets remain secrets. Think how much foreign exchange India could make by both using it and exporting the same worldwide. And I m being serious. There are many who believe in it, but sadly, the ones who know the secret don t seem to want India s progress. Just think of the carbon credits that India could have, which itself would make India a superpower. And I m being serious.

Western Ideas and Implementation. Even in the late and early 18th century, there were many machines that were designed to have controlled flight, but it was only the Wright Flyer that was able to demonstrate a controlled flight in 1903. The ones who came pretty close to what the Wrights achieved were the people by the name of Cayley and Langley. They actually studied what the pioneers had done. They looked at what Otto Lilienthal had done, as he had done a lot of hang-gliding and put a lot of literature in the public domain then.

Furthermore, they also consulted Octave Chanute. The whole system and history of the same are a bit complicated, but it does give a window to what happened then. So, it won t be wrong to say that whatever the Wright Brothers accomplished would probably not have been possible or would have taken years or maybe even decades if that literature and experiments, drawings, etc. in the commons were not available. So, while they did experimentation, they also looked at what other people were doing and had done which was in public domain/commons.

They also did a lot of testing, which gave them new insights. Even the propulsion system they used in the 1903 flight was a design by Nicolaus Otto. In fact, the Aircraft would not have been born if the Chinese had not invented kites in the early sixth century A.D. One also has to credit Issac Newton because of the three laws of motion, again without which none of the above could have happened. What is credited to the Wilbur brothers is not just they made the Kitty Hawk, they also made it commercial as they sold it and variations of the design to the American Air Force and also made a pilot school where pilots were trained for warfighting. 119 odd pilots came out of that school. The Wrights thought that air supremacy would end the war early, but this turned out to be a false hope.

Competition and Those Magnificent Men and their flying machines One of the first competitions to unlock creativity was the English Channel crossing offer made by Daily Mail. This was successfully done by the Frenchman Louis Bl riot. You can read his account here. There were quite a few competitions before World War 1 broke out. There is a beautiful, humorous movie that does dedicate itself to imagining how things would have gone in that time. In fact, there have been two movies, this one and an earlier movie called Sky Riders made many a youth dream. The other movie sadly is not yet in the public domain, and when it will be nobody knows, but if you see it or even read it, it gives you goosebumps.

World War 1 and Improvements to Aircraft World War 1 is remembered as the Great War or the War to end all wars in an attempt at irony. It did a lot of destruction of both people and property, and in fact, laid the foundation of World War 2. At the same time, if World War 1 hadn t happened then Airpower, Plane technology would have taken decades. Even medicine and medical techniques became revolutionary due to World War 1. In order to be brief, I am not sharing much about World War 1 otherwise that itself would become its own blog post. And while it had its heroes and villains who, when, why could be tackled perhaps another time.

The Guggenheim Family and the birth of Civil Aviation If one has to credit one family for the birth of the Civil Aviation, it has to be the Guggenheim family. Again, I would not like to dwell much as much of their contribution has already been noted here. There are quite a few things still that need to be said and pointed out. First and foremost is the fact that they made lessons about flying from grade school to college and afterward till college and beyond which were in the syllabus, whereas in the Indian schooling system, there is nothing like that to date. Here, in India, even in Engineering courses, you don t have much info. Unless until you go for professional Aviation or Aeronautical courses and most of these courses cost a bomb so either the very rich or the very determined (with loans) only go for that, at least that s what my friends have shared. And there is no guarantee you will get a job after that, especially in today s climate. Even their fund, grants, and prizes which were given to people for various people so that improvements could be made to the United States Civil Aviation. This, as shared in the report/blog post shared, was in response to what the younger child/brother saw as Europe having a large advantage both in Military and Civil Aviation. They also made several grants in several Universities which would not only do notable work during their lifetime but carry on the legacy researching on different aspects of Aircraft. One point that should be noted is that Europe was far ahead even then of the U.S. which prompted the younger son. There had already been talks of civil/civilian flights on European routes, although much different from what either of us can imagine today. Even with everything that the U.S. had going for her and still has, Europe is the one which has better airports, better facilities, better everything than the U.S. has even today. If you look at the lists of the Airports for better value of money or facilities, you would find many Airports from Europe, some from Asia, and only a few from the U.S. even though they are some of the most frequent users of the service. But that debate and arguments I would have to leave for perhaps the next blog post as there is still a lot to be covered between the 1930s, 1950s, and today. The Guggenheims archives does a fantastic job of sharing part of the story till the 1950s, but there is also quite a bit which it doesn t. I will probably start from that in the next blog post and then carry on ahead. Lastly, before I wind up, I have to share why I felt the need to write, capture and share this part of Aviation history. The plain and simple reason being, many of the people I meet either on the web, on Twitter or even in real life, many of them are just unaware of how this whole thing came about. The unawareness in my fellow brothers and sisters is just shocking, overwhelming. At least, by sharing these articles, I at least would be able to guide them or at least let them know how it all came to be and where things are going and not just be so clueless. Till later.

Christoph and I are please to share that a new release 1.1.57-1 of x13binary, of the X-13ARIMA-SEATS program by the US Census Bureau (with updated upstream release 1.1.57) is now on CRAN. The x13binary package takes the pain out of installing X-13ARIMA-SEATS by making it a fully resolved CRAN dependency. For example, when installing the excellent seasonal package by Christoph, then X-13ARIMA-SEATS will get pulled in via the x13binary package and things just work. Just depend on x13binary and on all major OSs supported by R you should have an X-13ARIMA-SEATS binary installed which will be called seamlessly by the higher-level packages such as seasonal or gunsales. With this the full power of the what is likely the world s most sophisticated deseasonalization and forecasting package is now at your fingertips and the R prompt, just like any other of the 17960+ CRAN packages. You can read more about this (and the seasonal package) in the Journal of Statistical Software paper by Christoph and myself. This release brings a new upstream release as well as binaries. We continue to support two Linux flavours (theh standard x86_64 as well as armv7l), windows and for a first time two macOS flavour. In addition to the existing Intel binary we now have a native built using the arm64 M1 chip (with thanks to Kirill for the assist). We still lack a genuine binary for Solaris so if any of the esteemed readers of this post happens to have access to R on Solaris along with a basic Fortran compiler, we would love to hear from you. Building X-13ARIMA-SEATS from source on Solaris should be straightforward as it is on the other OSs. Or is someone with a bit of time wants to help following Gabor s tutorial we would greatly appreciate it. Courtesy of my CRANberries, there is also a diffstat report for this release showing changes to the previous release. If you like this or other open-source work I do, you can sponsor me at GitHub.

This post by Dirk Eddelbuettel originated on his Thinking inside the box blog. Please report excessive re-aggregation in third-party for-profit settings.

Wrote a tool to parse /sys/block/*/stat. It's probably impossible for a human brain to appreciate the numbers so I made a web page that you can paste the contents and parse it from JS to emit some processed numbers. Probably iostat is the tool you want, but hey, sometimes you need this kind of stuff.

If you have been involved in Debian packaging at all in the last few years, you are probably aware that autopkgtest is now an important piece of the Debian release process. Back in 2018, the automated testing migration process started considering autopkgtest test results as part of its decision making. Since them, this process has received several improvements. For example, during the bullseye freeze, non-key packages with a non-trivial autopkgtest test suite could migrate automatically to testing without their maintainers needing to open unblock requests, provided there was no regression in theirs autopkgtest (or those from their reverse dependencies). Since 2014 when ci.debian.net was first introduced, we have seen an amazing increase in the number of packages in Debian that can be automatically tested. We went from around 100 to 15,000 today. This means not only happier maintainers because their packages get to testing faster, but also improved quality assurance for Debian as a whole. Chart showing the number of packages tested by ci.debian.net. Starts from close to 0 in 2014, up to 15,000 in 2021. The growth tendency seems to slow down in the last year

Chart showing the number of packages tested by ci.debian.net. Starts from close to 0 in 2014, up to 15,000 in 2021. The growth tendency seems to slow down in the last year

However, the growth rate seems to be decreasing. Maybe the low hanging fruit have all been picked, or maybe we just need to help more people jump in the automated testing bandwagon. With that said, we would like to encourage and help more maintainers to add autopkgtest to their packages. To that effect, I just created the autopkgtest-help repository on salsa, where we will take help requests from maintainers working on autopkgtest for their packages. If you want help, please go ahead and create an issue in there. To quote the repository README:

Valid requests:

"I want to add autopkgtest to package X. X is a tool that [...] and it works by [...]. How should I approach testing it?" It's OK if you have no idea where to start. But at least try to describe your package, what it does and how it works so we can try to help you.

"I started writing autopkgtest for X, here is my current work in progress [link]. But I encountered problem Y. How to I move forward?" If you already have an autopkgtest but is having trouble making it work as you think it should, you can also ask here.

Invalid requests:

"Please write autopkgtest for my package X for me". As with anything else in free software, please show appreciation for other people's time, and do your own research first. If you pose your question with enough details (see above) and make it interesting, it may be that whoever answers will write at least a basic structure for you, but as the maintainer you are still the expert in the package and what tests are relevant.

If you ask your question soon, you might get your answer recorded in video: we are going to have a DebConf21 talk next month, where we I and Paul Gevers (elbrus) will answer a few autopkgtest questions in video for posterity. Now, if you have experience enabling autopkgtest for you own packages, please consider watching that repository there to help us help our fellow maintainers.

A quick note to self to remind how I do backups of my Android device with rsync (and adb). I have followed this guide: How to use rsync over USB on Android with adb My personal notes:

I have Lineage so I have rsync in my Android device already installed
I run Debian stable (buster, for now) on my laptop, with adb installed
My /sdcard/rsyncd.conf file:

address = 127.0.0.1
port = 1873
uid = 0
gid = 0
[root]
path = / 
use chroot = false 
read only = false'

The command:

adb shell /data/local/tmp/rsync --daemon --no-detach --config=/sdcard/rsyncd.conf --log-file=/proc/self/fd/2 didn't work, produced this message: "@ERROR: protocol startup error" so I ended up doing: adb shell rsync --daemon --no-detach --config=/sdcard/rsyncd.conf --log-file=/sdcard/rsync.log and opened another tab to perform the rsync commands from my laptop:

rsync -av --progress --stats rsync://localhost:6010/root/storage .
rsync -av --progress --stats rsync://localhost:6010/root/data .

Then I saw that rsync was copying the symlinks instead of their contents: /storage/self/primary was a broken link to /mnt/user/0/primary So I ran again the commands with -LK:

rsync -av --progress --stats -LK rsync://localhost:6010/root/storage .
rsync -av --progress --stats -LK rsync://localhost:6010/root/data .

and now I have a copy of all the files I'm interested. In addition to this, I run an adb backup of the system: adb backup -f ./adb_backup_apk_shared_all_system.ad -apk -shared -all -system and I think that's all that I need for the case I want to remove stuff from my phone or some disaster happens.

So I had another major email crash with my syncmaildir setup. This time I was at least able to confirm the issue, and I still haven't lost mail thanks to backups and sheer luck (again).

The crash It is not really worth going over the crash in details, it's fairly similar to the last one: something bad happened and smd started destroying everything. The hint is that it takes a long time to do what usually takes seconds. It helps that I now have a second monitor showing logs. I still lost much more mail than the last time. I used to have "301 723 messages", according to notmuch. But then when I ran smd-pull by hand, it was telling me:

95K emails scanned

Oops. You can see notmuch happily noticing the destroyed files on the server:

jun 28 16:33:40 marcos notmuch[28532]: No new mail. Removed 65498 messages. Detected 1699 file renames.
jun 28 16:36:05 marcos notmuch[29746]: No new mail. Removed 68883 messages. Detected 2488 file renames.
jun 28 16:41:40 marcos notmuch[31972]: No new mail. Removed 118295 messages. Detected 3657 file renames.

The final count ended up being 81 042 messages, according to notmuch. A whopping 220 000 mails deleted. The interesting bit, this time around, is that I caught smd in the act of running two processes in parallel:

jun 28 16:30:09 curie systemd[2845]: Finished pull emails with syncmaildir. 
jun 28 16:30:09 curie systemd[2845]: Starting push emails with syncmaildir... 
jun 28 16:30:09 curie systemd[2845]: Starting pull emails with syncmaildir...

So clearly that is the source of the bug.

Recovery Emergency stop on curie:

notmuch dump > notmuch.dump
systemctl --user --now disable smd-pull.service smd-pull.timer smd-push.service smd-push.timer notmuch-new.service notmuch-new.timer

On marcos (the server), guessed the number of messages delivered since the last backup to be 71, just looking at timestamps in the mail log. Made a list:

grep postfix/local /var/log/mail.log   tail -71 > lost-mail

Found postfix queue IDs:

sed 's/.*\]://;s/:.*//' lost-mail > qids

Turn those into message IDs, find those that are missing from the disk (had previously ran notmuch new just to be sure it's up to date):

while read qid ; do 
    grep "$qid: message-id" /var/log/mail.log
done < qids    sed 's/.*message-id=<//;s/>//'   while read msgid; do
    sudo -u anarcat notmuch count --exclude=false id:$msgid   grep -q 0 && echo $msgid
done

Copy this back on curie as missing-msgids and:

$ wc -l missing-msgids 
48 missing-msgids
$ while read msgid ; do notmuch count --exclude=false id:$msgid   grep -q 0 && echo $msgid ; done < missing-msgids
mailman.189.1624881611.23397.nodes-reseaulibre.ca@reseaulibre.ca
AnwMy7rdSpK-N-vt4AiOag@ismtpd0148p1mdw1.sendgrid.net

only two mails missing! whoohoo! Copy those back onto marcos as really-missing-msgids, and look at the full mail logs to see what they are:

~anarcat/src/koumbit-scripts/mail/postfix-trace --from-file really-missing-msgids2

I actually remembered deleting those, so no mail lost! Rebuild the list of msgids that were lost, on marcos:

while read qid ; do grep "$qid: message-id" /var/log/mail.log; done < qids    sed 's/.*message-id=<//;s/>//'

Copy that on curie as lost-mail-msgids, then copy the files over in a test dir:

while read msgid ; do
    notmuch search --output=files --exclude=false "id:$msgid"
done < lost-mail-msgids   sed 's#/home/anarcat/Maildir/##'   rsync -v  --files-from=- /home/anarcat/Maildir/ shell.anarc.at:restore/Maildir-angela/

If that looks about right, on marcos:

find restore/Maildir-angela/ -type f   wc -l

... should match the number of missing mails, roughly. Copy if in the real spool:

while read msgid ; do
    notmuch search --output=files --exclude=false "id:$msgid"
done < lost-mail-msgids   sed 's#/home/anarcat/Maildir/##'   rsync -v  --files-from=- /home/anarcat/Maildir/ shell.anarc.at:Maildir/

Then on the server, notmuch new should find the new emails, and we shouldn't have any lost mail anymore:

while read qid ; do grep "$qid: message-id" /var/log/mail.log; done < qids    sed 's/.*message-id=<//;s/>//'   while read msgid; do sudo -u anarcat notmuch count --exclude=false id:$msgid   grep -q 0 && echo $msgid ; done

Then, crucial moment, try to pull the new mails from the backups on curie:

anarcat@curie:~(main)$ smd-pull  -n  --show-tags -v
Found lockfile of a dead instance. Ignored.
Phase 0: handshake
Phase 1: changes detection
    5K emails scanned
   10K emails scanned
   15K emails scanned
   20K emails scanned
   25K emails scanned
   30K emails scanned
   35K emails scanned
   40K emails scanned
   45K emails scanned
   50K emails scanned
Phase 2: synchronization
Phase 3: agreement
default: smd-client@localhost: TAGS: stats::new-mails(49687), del-mails(0), bytes-received(215752279), xdelta-received(3703852)
"smd-pull  -n  --show-tags -v" took 3 mins 39 secs

This brought me back to the state after the backup plus the mails delivered during the day, which means I had to catchup with all my holiday's read emails (1440 mails!) but thankfully I made a dump of the notmuch database on curie at the start of the procedure, so this actually restored a sane state:

pv notmuch.dump   notmuch restore

Phew!

Workaround I have filed this as a bug in upstream issue 18. Considering I filed 11 issues and only 3 of those were closed, I'm not holding my breath. I nevertheless filed PR 19 in the hope that this will fix my particular issue, but I'm not even sure this is the right fix...

Fix At this point, I'm really ready to give up on SMD. It's really, really nice to be able to sync mail over SSH because I don't need to store my IMAP password on disk. But surely there are more reliable syncing mechanisms. I do not remember ever losing that much mail before. At worst, offlineimap would duplicate emails like mad, but never destroy my entire mail spool that way. As mentioned before, there are other programs that sync mail. I'm looking at:

offlineimap3: requires IMAP, used the py2 version in the past, might just still work, first sync painful (IIRC), ways to tunnel over SSH, see comment below

isync/mbsync: might be faster, I remember having trouble switching from offlineimap to this, has support for TLS client certs, running over SSH, and generally has good words from multiple Debian and notmuch people

getmail: just downloads email, might not be enough

nncp: treat the local spool as another mail server, might not be compatible with my "multiple clients" setup

doveadm-sync: requires dovecot on both ends, but supports using SSH to sync, will try this next, may have performance problems, see comment below

interimap: syncs two IMAP servers, apparently faster than `doveadm` and `offlineimap`

mail-sync: notify support, happens over any piped transport (e.g. ssh), diff/patch system, requires binary on both ends, mentions UUCP in the manpage, seems awfully complicated to setup, mentions `rsmtp` which is a nice name for `rsendmail`

8.5 years ago, I moved my blog to Ikiwiki and Branchable. It's now time for me to take the next step and host my blog on my own server. This is how I migrated from Branchable to my own Apache server.

Installing Ikiwiki dependencies Here are all of the extra Debian packages I had to install on my server:

apt install ikiwiki ikiwiki-hosting-common gcc libauthen-passphrase-perl libcgi-formbuilder-perl libcrypt-sslauthen-passphrase-perl libcgi-formbuilder-perl libcrypt-ssleay-perl libjson-xs-perl librpc-xml-perl python-docutils libxml-feed-perl libsearch-xapian-perl libmailtools-perl highlight-common libsearch-xapian-perl xapian-omega
apt install --no-install-recommends ikiwiki-hosting-web libgravatar-url-perl libmail-sendmail-perl libcgi-session-perl
apt purge libnet-openid-consumer-perl

Then I enabled the CGI module in Apache:

a2enmod cgi

and un-commented the following in /etc/apache2/mods-available/mime.conf:

AddHandler cgi-script .cgi

Creating a separate user account Since Ikiwiki needs to regenerate my blog whenever a new article is pushed to the git repo or a comment is accepted, I created a restricted user account for it:

adduser blog
adduser blog sshuser
chsh -s /usr/bin/git-shell blog

git setup Thanks to Branchable storing blogs in git repositories, I was able to import my blog using a simple git clone in /home/blog (the srcdir):

git clone --bare git://feedingthecloud.branchable.com/ source.git

Note that the name of the directory (source.git) is important for the ikiwikihosting plugin to work. Then I pulled the .setup file out of the setup branch in that repo and put it in /home/blog/.ikiwiki/FeedingTheCloud.setup. After that, I deleted the setup branch and the origin remote from that clone:

git branch -d setup
git remote rm origin

Following the recommended git configuration, I created a working directory (the repository) for the blog user to modify the blog as needed:

cd /home/blog/
git clone /home/blog/source.git FeedingTheCloud

I added my own ssh public key to /home/blog/.ssh/authorized_keys so that I could push to the srcdir from my laptop. Finaly, I generated a new ssh key without a passphrase:

ssh-keygen -t ed25519

and added it as deploy key to the GitHub repo which acts as a read-only mirror of my blog.

Ikiwiki config While I started with the Branchable setup file, I changed the following things in it:

adminemail: webmaster@fmarier.org
srcdir: /home/blog/FeedingTheCloud
destdir: /var/www/blog
url: https://feeding.cloud.geek.nz
cgiurl: https://feeding.cloud.geek.nz/blog.cgi
cgi_wrapper: /var/www/blog/blog.cgi
cgi_wrappermode: 675
add_plugins:
- goodstuff
- lockedit
- comments
- blogspam
- sidebar
- attachment
- favicon
- format
- highlight
- search
- theme
- moderatedcomments
- flattr
- calendar
- headinganchors
- notifyemail
- anonok
- autoindex
- date
- relativedate
- htmlbalance
- pagestats
- sortnaturally
- ikiwikihosting
- gitpush
- emailauth
disable_plugins:
- brokenlinks
- fortune
- more
- openid
- orphans
- passwordauth
- progress
- recentchanges
- repolist
- toggle
- txt
sslcookie: 1
cookiejar:
  file: /home/blog/.ikiwiki/cookies
useragent: ikiwiki
git_wrapper: /home/blog/source.git/hooks/post-update
urlalias:
- http://feeds.cloud.geek.nz/
- http://www.feeding.cloud.geek.nz/
owner: francois@fmarier.org
hostname: feeding.cloud.geek.nz
emailauth_sender: login@fmarier.org
allowed_attachments: admin()

Then I created the destdir:

mkdir /var/www/blog
chown blog:blog /var/www/blog

and generated the initial copy of the blog as the blog user:

ikiwiki --setup .ikiwiki/FeedingTheCloud.setup --wrappers --rebuild

One thing that failed to generate properly was the tag cloug (from the pagestats plugin). I have not been able to figure out why it fails to generate any output when run this way, but if I push to the repo and let the git hook handle the rebuilding of the wiki, the tag cloud is generated correctly. Consequently, fixing this is not high on my list of priorities, but if you happen to know what the problem is, please reach out.

Apache config Here's the Apache config I put in /etc/apache2/sites-available/blog.conf:

<VirtualHost *:443>
    ServerName feeding.cloud.geek.nz
    SSLEngine On
    SSLCertificateFile /etc/letsencrypt/live/feeding.cloud.geek.nz/fullchain.pem
    SSLCertificateKeyFile /etc/letsencrypt/live/feeding.cloud.geek.nz/privkey.pem
    Header set Strict-Transport-Security: "max-age=63072000; includeSubDomains; preload"
    Include /etc/fmarier-org/blog-common
</VirtualHost>
<VirtualHost *:443>
    ServerName www.feeding.cloud.geek.nz
    ServerAlias feeds.cloud.geek.nz
    SSLEngine On
    SSLCertificateFile /etc/letsencrypt/live/feeding.cloud.geek.nz/fullchain.pem
    SSLCertificateKeyFile /etc/letsencrypt/live/feeding.cloud.geek.nz/privkey.pem
    Redirect permanent / https://feeding.cloud.geek.nz/
</VirtualHost>
<VirtualHost *:80>
    ServerName feeding.cloud.geek.nz
    ServerAlias www.feeding.cloud.geek.nz
    ServerAlias feeds.cloud.geek.nz
    Redirect permanent / https://feeding.cloud.geek.nz/
</VirtualHost>

and the common config I put in /etc/fmarier-org/blog-common:

ServerAdmin webmaster@fmarier.org
DocumentRoot /var/www/blog
LogLevel core:info
CustomLog $ APACHE_LOG_DIR /blog-access.log combined
ErrorLog $ APACHE_LOG_DIR /blog-error.log
AddType application/rss+xml .rss
<Location /blog.cgi>
        Options +ExecCGI
</Location>

before enabling all of this using:

a2ensite blog
apache2ctl configtest
systemctl restart apache2.service

The feeds.cloud.geek.nz domain used to be pointing to Feedburner and so I need to maintain it in order to avoid breaking RSS feeds from folks who added my blog to their reader a long time ago.

Server-side improvements Since I'm now in control of the server configuration, I was able to make several improvements to how my blog is served. First of all, I enabled the HTTP/2 and Brotli modules:

a2enmod http2
a2enmod brotli

and enabled Brotli compression by putting the following in /etc/apache2/conf-available/compression.conf:

<IfModule mod_brotli.c>
  <IfDefine !TRANSFER_COMPRESSION>
    Define TRANSFER_COMPRESSION BROTLI_COMPRESS
  </IfDefine>
</IfModule>
<IfModule mod_deflate.c>
  <IfDefine !TRANSFER_COMPRESSION>
    Define TRANSFER_COMPRESSION DEFLATE
  </IfDefine>
</IfModule>
<IfDefine TRANSFER_COMPRESSION>
  <IfModule mod_filter.c>
    AddOutputFilterByType $ TRANSFER_COMPRESSION  text/html text/plain text/xml text/css text/javascript
    AddOutputFilterByType $ TRANSFER_COMPRESSION  application/x-javascript application/javascript application/ecmascript
    AddOutputFilterByType $ TRANSFER_COMPRESSION  application/rss+xml
    AddOutputFilterByType $ TRANSFER_COMPRESSION  application/xml
  </IfModule>
</IfDefine>

and replacing /etc/apache2/mods-available/deflate.conf with the following:

# Moved to /etc/apache2/conf-available/compression.conf as per https://bugs.debian.org/972632

before enabling this new config:

a2enconf compression

Next, I made my blog available as a Tor onion service by putting the following in /etc/apache2/sites-available/blog.conf:

<VirtualHost *:443>
    ServerName feeding.cloud.geek.nz
    ServerAlias xfdug5vmfi6oh42fp6ahhrqdjcf7ysqat6fkp5dhvde4d7vlkqixrsad.onion
    Header set Onion-Location "http://xfdug5vmfi6oh42fp6ahhrqdjcf7ysqat6fkp5dhvde4d7vlkqixrsad.onion% REQUEST_URI s"
    Header set alt-svc 'h2="xfdug5vmfi6oh42fp6ahhrqdjcf7ysqat6fkp5dhvde4d7vlkqixrsad.onion:443"; ma=315360000; persist=1'
    ...

<VirtualHost *:80>
    ServerName xfdug5vmfi6oh42fp6ahhrqdjcf7ysqat6fkp5dhvde4d7vlkqixrsad.onion
    Include /etc/fmarier-org/blog-common
</VirtualHost>

Then I followed the Mozilla Observatory recommendations and enabled the following security headers:

Header set Content-Security-Policy: "default-src 'none'; report-uri https://fmarier.report-uri.com/r/d/csp/enforce ; style-src 'self' 'unsafe-inline' ; img-src 'self' https://seccdn.libravatar.org/ ; script-src https://feeding.cloud.geek.nz/ikiwiki/ https://xfdug5vmfi6oh42fp6ahhrqdjcf7ysqat6fkp5dhvde4d7vlkqixrsad.onion/ikiwiki/ http://xfdug5vmfi6oh42fp6ahhrqdjcf7ysqat6fkp5dhvde4d7vlkqixrsad.onion/ikiwiki/ 'unsafe-inline' 'sha256-pA8FbKo4pYLWPDH2YMPqcPMBzbjH/RYj0HlNAHYoYT0=' 'sha256-Kn5E/7OLXYSq+EKMhEBGJMyU6bREA9E8Av9FjqbpGKk=' 'sha256-/BTNlczeBxXOoPvhwvE1ftmxwg9z+WIBJtpk3qe7Pqo=' ; base-uri 'self'; form-action 'self' ; frame-ancestors 'self'"
Header set X-Frame-Options: "SAMEORIGIN"
Header set Referrer-Policy: "same-origin"
Header set X-Content-Type-Options: "nosniff"

Note that the Mozilla Observatory is mistakenly identifying HTTP onion services as insecure, so you can ignore that failure. I also used the Mozilla TLS config generator to improve the TLS config for my server. Then I added security.txt and gpc.json to the root of my git repo and then added the following aliases to put these files in the right place:

Alias /.well-known/gpc.json /var/www/blog/gpc.json
Alias /.well-known/security.txt /var/www/blog/security.txt

I also followed these instructions to create a sitemap for my blog with the following alias:

Alias /sitemap.xml /var/www/blog/sitemap/index.rss

Finally, I simplified a few error pages to save bandwidth:

ErrorDocument 301 " "
ErrorDocument 302 " "
ErrorDocument 404 "Not Found"

Monitoring 404s Another advantage of running my own web server is that I can monitor the 404s easily using logcheck by putting the following in /etc/logcheck/logcheck.logfiles:

/var/log/apache2/blog-error.log

Based on that, I added a few redirects to point bots and users to the location of my RSS feed:

Redirect permanent /atom /index.atom
Redirect permanent /comments.rss /comments/index.rss
Redirect permanent /comments.atom /comments/index.atom
Redirect permanent /FeedingTheCloud /index.rss
Redirect permanent /feed /index.rss
Redirect permanent /feed/ /index.rss
Redirect permanent /feeds/posts/default /index.rss
Redirect permanent /rss /index.rss
Redirect permanent /rss/ /index.rss

and to tell them to stop trying to fetch obsolete resources:

Redirect gone /~ff/FeedingTheCloud
Redirect gone /gittip_button.png
Redirect gone /ikiwiki.cgi

I also used these 404s to discover a few old Feedburner URLs that I could redirect to the right place using archive.org:

Redirect permanent /feeds/1572545745827565861/comments/default /posts/watch-all-of-your-logs-using-monkeytail/comments.atom
Redirect permanent /feeds/1582328597404141220/comments/default /posts/news-feeds-rssatom-for-mythtvorg-and/comments.atom
...
Redirect permanent /feeds/8490436852808833136/comments/default /posts/recovering-lost-git-commits/comments.atom
Redirect permanent /feeds/963415010433858516/comments/default /posts/debugging-openwrt-routers-by-shipping/comments.atom

I also put the following robots.txt in the git repo in order to stop a bunch of authentication errors coming from crawlers:

User-agent: *
Disallow: /blog.cgi
Disallow: /ikiwiki.cgi

Future improvements There are a few things I'd like to improve on my current setup. The first one is to remove the iwikihosting and gitpush plugins and replace them with a small script which would simply git push to the read-only GitHub mirror. Then I could uninstall the ikiwiki-hosting-common and ikiwiki-hosting-web since that's all I use them for. Next, I would like to have proper support for signed git pushes. At the moment, I have the following in /home/blog/source.git/config:

[receive]
    advertisePushOptions = true
    certNonceSeed = "(random string)"

but I'd like to also reject unsigned pushes. While my blog now has a CSP policy which doesn't rely on unsafe-inline for scripts, it does rely on unsafe-inline for stylesheets. I tried to remove this but the actual calls to allow seemed to be located deep within jQuery and so I gave up. Update: now fixed. Finally, I'd like to figure out a good way to deal with articles which don't currently have comments. At the moment, if you try to subscribe to their comment feed, it returns a 404. For example:

[Sun Jun 06 17:43:12.336350 2021] [core:info] [pid 30591:tid 140253834704640] [client 66.249.66.70:57381] AH00128: File does not exist: /var/www/blog/posts/using-iptables-with-network-manager/comments.atom

This is obviously not ideal since many feed readers will refuse to add a feed which is currently not found even though it could become real in the future. If you know of a way to fix this, please let me know.

This post describes two similar solutions to the Quiz #2, using the data files found there. The two solutions described here rely on split-on-values. The first solution is the one that came naturally to me, and is by far the most general and extensible, but the second one is shorter, and doesn't require external script files.

Solution #1 The key to both solution is to separate the original data into a series of datasets that only contain data at a fixed value of x (which corresponds here to a fixed pH), and then process each dataset one by one to extract the average and standard deviation. This first step is done thus:

QSoas> load kcat-vs-ph.dat
QSoas> split-on-values pH x /flags=data

After these commands, the stacks contains a series of datasets bearing the data flag, that each contain a single column of data, as can be seen from the beginnings of a show-stack command:

QSoas> k
Normal stack:
	 F  C	Rows	Segs	Name	
#0	(*) 1	43	1	'kcat-vs-ph_subset_22.dat'
#1	(*) 1	44	1	'kcat-vs-ph_subset_21.dat'
#2	(*) 1	43	1	'kcat-vs-ph_subset_20.dat'
...

Each of these datasets have a meta-data named pH whose value is the original x value from kcat-vs-ph.dat. Now, the idea is to run a stats command on the resulting datasets, extracting the average value of x and its standard deviation, together with the value of the meta pH. The most natural and general way to do this is to use run-for-datasets, using the following script file (named process-one.cmds):

stats /meta=pH /output=true /stats=x_average,x_stddev

So the command looks like:

QSoas> run-for-datasets process-one.cmds flagged:data

This command produces an output file containing, for each flagged dataset, a line containing x_average, x_stddev, and pH. Then, it is just a matter of loading the output file and shuffling the columns in the right order to get the data in the form asked. Overall, this looks like this:

l kcat-vs-ph.dat
split-on-values pH x /flags=data
output result.dat /overwrite=true
run-for-datasets process-one.cmds flagged:data
l result.dat
apply-formula tmp=y2;y2=y;y=x;x=tmp
dataset-options /yerrors=y2

The slight improvement over what is described above is the use of the output command to write the output to a dedicated file (here result.dat), instead of out.dat and ensuring it is overwritten, so that no data remains from previous runs.

Solution #2 The second solution is almost the same as the first one, with two improvements:

the stats command can work with datasets other than the current one, by supplying them to the /buffers= option, so that it is not necessary to use run-for-datasets;
the use of the output file can by replaced by the use of the accumulator.

This yields the following, smaller, solution:

l kcat-vs-ph.dat
split-on-values pH x /flags=data
stats /meta=pH /accumulate=* /stats=x_average,x_stddev /buffers=flagged:data
pop
apply-formula tmp=y2;y2=y;y=x;x=tmp
dataset-options /yerrors=y2

About QSoas QSoas is a powerful open source data analysis program that focuses on flexibility and powerful fitting capacities. It is released under the GNU General Public License. It is described in Fourmond, Anal. Chem., 2016, 88 (10), pp 5050 5052. Current version is 3.0. You can download its source code there (or clone from the GitHub repository) and compile it yourself, or buy precompiled versions for MacOS and Windows there.

It seems that Netflix has an ongoing issue of not working well with IPv6, apparently they have some sort of region checking code that doesn t correctly identify IPv6 prefixes. To fix this I wrote the following script to make a small zone file with only A records for Netflix and no AAAA records. The $OUT.header file just has the SOA record for my fake netflix.com domain.

#!/bin/bash
OUT=/etc/bind/data/netflix.com
HEAD=$OUT.header
cp $HEAD $OUT
dig -t a www.netflix.com @8.8.8.8 sed -n -e "s/^.*IN/www IN/p" grep [0-9]$ >> $OUT
dig -t a android.prod.cloud.netflix.com @8.8.8.8 sed -n -e "s/^.*IN/android.prod.cloud IN/p" grep [0-9]$ >> $OUT
/usr/sbin/rndc reload > /dev/null

Update I updated this post to add a line for android.prod.cloud.netflix.com which is the address used by Android devices.

This second quiz may sound like the first one, but in fact, the approach used is completely different. The point is to gather some elementary statistics from a series of experiments performed under different conditions, but with several repeats at the same conditions.
Quiz You are given a file (which you can download there) that contains a series of pH value data: the X column is the pH, the Y column the result of the experiment at the given pH (let's say the measure of the catalytic rate of an enzyme). Your task is to take this data and produce a single dataset which contains, for each pH value, the pH, the average of the results at that pH and the standard deviation. The result should be identical to the following file, and should look like that:

There are several ways to do this, but all ways must rely on stats, and the more natural way in QSoas is to take advantage of split-on-values, which is a very powerful command but somehow hard to master, which is the point of this Quiz.
By the way, the data file is purely synthetic, if you look in the GitHub repository, you'll see how it was generated.

About QSoas QSoas is a powerful open source data analysis program that focuses on flexibility and powerful fitting capacities. It is released under the GNU General Public License. It is described in Fourmond, Anal. Chem., 2016, 88 (10), pp 5050 5052. Current version is 3.0. You can download its source code there (or clone from the GitHub repository) and compile it yourself, or buy precompiled versions for MacOS and Windows there.

I just started looking at the Kubernetes documentation and interactive tutorial [1], which incidentally is really good. Everyone who is developing a complex system should look at this to get some ideas for online training. Here are some notes on setting it up on Debian. Add Kubernetes Apt Repository

deb https://apt.kubernetes.io/ kubernetes-xenial main

First add the above to your apt sources configuration (/etc/apt/sources.list or some file under /etc/apt/sources.list.d) for the kubectl package. Ubuntu Xenial is near enough to Debian/Buster and Debian/Unstable that it should work well for both of them. Then install the GPG key 6A030B21BA07F4FB for use by apt:

gpg --recv-key 6A030B21BA07F4FB
gpg --list-sigs 6A030B21BA07F4FB
gpg --export 6A030B21BA07F4FB   apt-key add -

The Google key in question is not signed. Install Packages for the Tutorial The online training is based on minikube which uses libvirt to setup a KVM virtual machine to do stuff. To get this running you need to have a system that is capable of running KVM (IE the BIOS is set to allow hardware virtualisation). It MIGHT work on QEMU software emulation without KVM support (technically it s possible but it would be slow and require some code to handle that), I didn t test if it does. Run the following command to install libvirt, kvm, and dnsmasq (which minikube requires) and kubectl on Debian/Buster:

apt install libvirt-clients libvirt-daemon-system qemu-kvm dnsmasq kubectl

For Debian/Unstable run the following command:

apt install libvirt-clients libvirt-daemon-system qemu-system-x86 dnsmasq kubectl

To run libvirt as non-root without needing a password for everything you need to add the user in question to the libvirt group. I recommend running things as non-root whenever possible. In this case entering a password for everything will probably be more pain than you want. The Debian Wiki page about KVM [2] is worth reading. Install Minikube Test Environment Here is the documentation for installing Minikube [3]. Basically just download a single executable from the net, put it in your $PATH, and run it. Best to use non-root for that. Also you need at least 3G of temporary storage space in the home directory of the user that runs it. After installing minikube run minikube start which will download container image data and start it up. Then you can run commands like the following to see what it has done.

# get overview of virsh commands
virsh help
# list domains
virsh --connect qemu:///system list
# list block devices a domain uses
virsh --connect qemu:///system domblklist minikube
# show stats on block device usage
virsh --connect qemu:///system domblkstat minikube hda
# list virtual networks
virsh --connect qemu:///system net-list
# list dhcp leases on a virtual network
virsh --connect qemu:///system net-dhcp-leases minikube-net
# list network filters
virsh --connect qemu:///system nwfilter-list
# list real network interfaces
virsh --connect qemu:///system iface-list

I decided to start work on repeating the tests for my 2006 OSDC paper on Benchmarking Mail Relays [1] and discover how the last 15 years of hardware developments have changed things. There have been software changes in that time too, but nothing that compares with going from single core 32bit systems with less than 1G of RAM and 60G IDE disks to multi-core 64bit systems with 128G of RAM and SSDs. As an aside the hardware I used in 2006 wasn t cutting edge and the hardware I m using now isn t either. In both cases it s systems I bought second hand for under $1000. Pedants can think of this as comparing 2004 and 2018 hardware. BIND I decided to make some changes to reflect the increased hardware capacity and use 2560 domains and IP addresses, which gave the following errors as well as a startup time of a minute on a system with two E5-2620 CPUs.

May  2 16:38:37 server named[7372]: listening on IPv4 interface lo, 127.0.0.1#53
May  2 16:38:37 server named[7372]: listening on IPv4 interface eno4, 10.0.2.45#53
May  2 16:38:37 server named[7372]: listening on IPv4 interface eno4, 10.0.40.1#53
May  2 16:38:37 server named[7372]: listening on IPv4 interface eno4, 10.0.40.2#53
May  2 16:38:37 server named[7372]: listening on IPv4 interface eno4, 10.0.40.3#53
[...]
May  2 16:39:33 server named[7372]: listening on IPv4 interface eno4, 10.0.47.0#53
May  2 16:39:33 server named[7372]: listening on IPv4 interface eno4, 10.0.48.0#53
May  2 16:39:33 server named[7372]: listening on IPv4 interface eno4, 10.0.49.0#53
May  2 16:39:33 server named[7372]: listening on IPv6 interface lo, ::1#53
[...]
May  2 16:39:36 server named[7372]: zone localhost/IN: loaded serial 2
May  2 16:39:36 server named[7372]: all zones loaded
May  2 16:39:36 server named[7372]: running
May  2 16:39:36 server named[7372]: socket: file descriptor exceeds limit (123273/21000)
May  2 16:39:36 server named[7372]: managed-keys-zone: Unable to fetch DNSKEY set '.': not enough free resources
May  2 16:39:36 server named[7372]: socket: file descriptor exceeds limit (123273/21000)

The first thing I noticed is that a default configuration of BIND with 2560 local IPs (when just running in the default recursive mode) takes a minute to start and needed to open over 100,000 file handles. BIND also had some errors in that configuration which led to it not accepting shutdown requests. I filed Debian bug report #987927 [2] about this. One way of dealing with the errors in this situation on Debian is to edit /etc/default/named and put in the following line to allow BIND to access to many file handles:

OPTIONS="-u bind -S 150000"

But the best thing to do for BIND when there are many IP addresses that aren t going to be used for DNS service is to put a directive like the following in the BIND configuration to specify the IP address or addresses that are used for the DNS service:

listen-on   10.0.2.45;  ;

I have just added the listen-on and listen-on-v6 directives to one of my servers with about a dozen IP addresses. While 2560 IP addresses is an unusual corner case it s not uncommon to have dozens of addresses on one system. dig When doing tests of Postfix for relaying mail I noticed that mail was being deferred with DNS problems (error was Host or domain name not found. Name service error for name=a838.example.com type=MX: Host not found, try again . I tested the DNS lookups with dig which failed with errors like the following:

dig -t mx a704.example.com
socket.c:1740: internal_send: 10.0.2.45#53: Invalid argument
socket.c:1740: internal_send: 10.0.2.45#53: Invalid argument
socket.c:1740: internal_send: 10.0.2.45#53: Invalid argument
; <> DiG 9.16.13-Debian <> -t mx a704.example.com
;; global options: +cmd
;; connection timed out; no servers could be reached

Here is a sample of the strace output from tracing dig:
bind(20,  sa_family=AF_INET, sin_port=htons(0), 
sin_addr=inet_addr("0.0.0.0") , 16) = 0
recvmsg(20,  msg_namelen=128 , 0)       = -1 EAGAIN (Resource temporarily 
unavailable)
write(4, "\24\0\0\0\375\377\377\377", 8) = 8
sendmsg(20,  msg_name= sa_family=AF_INET, sin_port=htons(53), 
sin_addr=inet_addr("10.0.2.45") , msg_
namelen=16, msg_iov=[ iov_base="86\1 
\0\1\0\0\0\0\0\1\4a704\7example\3com\0\0\17\0\1\0\0)\20\0\0\0\0
\0\0\f\0\n\0\10's\367\265\16bx\354", iov_len=57 ], msg_iovlen=1, 
msg_controllen=0, msg_flags=0 , 0) 
= -1 EINVAL (Invalid argument)
write(2, "socket.c:1740: ", 15)         = 15
write(2, "internal_send: 10.0.2.45#53: Invalid argument", 45) = 45
write(2, "\n", 1)                       = 1
futex(0x7f5a80696084, FUTEX_WAIT_PRIVATE, 0, NULL) = 0
futex(0x7f5a80696010, FUTEX_WAKE_PRIVATE, 1) = 0
futex(0x7f5a8069809c, FUTEX_WAKE_PRIVATE, 1) = 1
futex(0x7f5a80698020, FUTEX_WAKE_PRIVATE, 1) = 1
sendmsg(20,  msg_name= sa_family=AF_INET, sin_port=htons(53), 
sin_addr=inet_addr("10.0.2.45") , msg_namelen=16, msg_iov=[ iov_base="86\1 
\0\1\0\0\0\0\0\1\4a704\7example\3com\0\0\17\0\1\0\0)\20\0\0\0\0\0\0\f\0\n\0\10's\367\265\16bx\354", 
iov_len=57 ], msg_iovlen=1, msg_controllen=0, msg_flags=0 , 0) = -1 EINVAL 
(Invalid argument)
write(2, "socket.c:1740: ", 15)         = 15
write(2, "internal_send: 10.0.2.45#53: Invalid argument", 45) = 45
write(2, "\n", 1)

Ubuntu bug #1702726 claims that an insufficient ARP cache was the cause of dig problems [3]. At the time I encountered the dig problems I was seeing lots of kernel error messages  neighbour: arp_cache: neighbor table overflow  which I solved by putting the following in /etc/sysctl.d/mine.conf:
net.ipv4.neigh.default.gc_thresh3 = 4096
net.ipv4.neigh.default.gc_thresh2 = 2048
net.ipv4.neigh.default.gc_thresh1 = 1024

Making that change (and having rebooted because I didn t need to run the server overnight) didn t entirely solve the problems. I have seen some DNS errors from Postfix since then but they are less common than before. When they happened I didn t have that error from dig. At this stage I m not certain that the ARP change fixed the dig problem although it seems likely (it s always difficult to be certain that you have solved a race condition instead of made it less common or just accidentally changed something else to conceal it). But it is clearly a good thing to have a large enough ARP cache so the above change is probably the right thing for most people (with the possibility of changing the numbers according to the required scale). Also people having that dig error should probably check their kernel message log, if the ARP cache isn t the cause then some other kernel networking issue might be related.
Preliminary Results
With Postfix I m seeing around 24,000 messages relayed per minute with more than 60% CPU time idle. I m not sure exactly how to count idle time when there are 12 CPU cores and 24 hyper-threads as having only 1 process scheduled for each pair of hyperthreads on a core is very different to having half the CPU cores unused. I ran my script to disable hyper-threads by telling the Linux kernel to disable each processor core that has the same core ID as another, it was buggy and disabled the second CPU altogether (better than finding this out on a production server). Going from 24 hyper-threads of 2 CPUs to 6 non-HT cores of a single CPU didn t change the thoughput and the idle time went to about 30%, so I have possibly halved the CPU capacity for these tasks by disabling all hyper-threads and one entire CPU which is surprising given that I theoretically reduced the CPU power by 75%. I think my focus now has to be on hyper-threading optimisation.
Since 2006 the performance has gone from ~20 messages per minute on relatively commodity hardware to 24,000 messages per minute on server equipment that is uncommon for home use but which is also within range of home desktop PCs. I think that a typical desktop PC with a similar speed CPU, 32G of RAM and SSD storage would give the same performance. Moore s Law (that transistor count doubles approximately every 2 years) is often misquoted as having performance double every 2 years. In this case more than 1024* the performance over 15 years means the performance doubling every 18 months. Probably most of that is due to SATA SSDs massively outperforming IDE hard drives but it s still impressive.
Notes
I ve been using example.com for test purposes for a long time, but RFC2606 specifies .test, .example, and .invalid as reserved top level domains for such things. On the next iteration I ll change my scripts to use .test.
My current test setup has a KVM virtual machine running my bhm program to receive mail which is taking between 20% and 50% of a CPU core in my tests so far. While that is happening the kvm process is reported as taking between 60% and 200% of a CPU core, so kvm takes as much as 4* the CPU of the guest due to the virtual networking overhead   even though I m using the virtio-net-pci driver (the most efficient form of KVM networking for emulating a regular ethernet card). I ve also seen this in production with a virtual machine running a ToR relay node.
I ve fixed a bug where Postal would try to send the SMTP quit command after encountering a TCP error which would cause an infinite loop and SEGV.

[1] https://doc.coker.com.au/papers/benchmarking-mail-relays-and-forwarders/
[2] https://bugs.debian.org/cgi-bin/bugreport.cgi?bug=987927
[3] https://bugs.launchpad.net/ubuntu/+source/dnsmasq/+bug/1702726


Related posts:
interesting things  /tmp /mnt/bind bind bind 0 0 Today I discovered that...
new release of postal  Today I have released a significant new version of my...
BIND Stats  In Debian the BIND server will by default append statistics...

A new release 1.1.39-3 of x13binary, of the X-13ARIMA-SEATS program by the US Census Bureau (with upstream release 1.1.39) is now on CRAN. The x13binary package takes the pain out of installing X-13ARIMA-SEATS by making it a fully resolved CRAN dependency. For example, when installing the excellent seasonal package by Christoph, then X-13ARIMA-SEATS will get pulled in via the x13binary package and things just work. Just depend on x13binary and on all major OSs supported by R you should have an X-13ARIMA-SEATS binary installed which will be called seamlessly by the higher-level packages such as seasonal or gunsales. With this the full power of the what is likely the world s most sophisticated deseasonalization and forecasting package is now at your fingertips and the R prompt, just like any other of the 17350+ CRAN packages. You can read more about this (and the seasonal package) in the Journal of Statistical Software paper by Christoph and myself. This release was needed because the recent M1mac build was reporting leftover detritus in the temporary directory, which we addressed with an explicit removal at end. We also addressed another CRAN Policy change since the last release, namely a conversion of the configure script from bash to sh. Now, sadly, that second aspect blew up on Solaris, and the detritus issue appears to be persist. By now Christoph and a colleague have installed R(-devel) on such an M1 machine, but still cannot reproduce. We will reach out to CRAN to learn more. A follow-up release 1.1.39-4 is likely. The good news is that the standard macOS binary works on M1 as do other binaries thanks to the translation layer. We do however lack a genuine binary for Solaris so if any of the esteemed readers of this post happens to have access to R on Solaris along with a basic Fortran compiler, we would love to hear from you. Building X-13ARIMA-SEATS from source on Solaris should be straightforward, it is on the other OSs. Courtesy of my CRANberries, there is also a diffstat report for this release showing changes to the previous release. If you like this or other open-source work I do, you can sponsor me at GitHub.

This post by Dirk Eddelbuettel originated on his Thinking inside the box blog. Please report excessive re-aggregation in third-party for-profit settings.

So, 2021 isn't bad enough yet, but don't despair, people are working to fix that:

Welcome to the Stallman wars Team Cancel: https://rms-open-letter.github.io/ (repo) Team Support: https://rms-support-letter.github.io/ (repo) Current stats are:

Team Cancel:  3028 signers from 1413 individual commit authors
Team Support: 6249 signers from 5018 individual commit authors

Git shortlog (Top 10):

rms_cancel.git (Last update: 2021-04-07 15:42:33 (UTC))
  1228  Neil McGovern
   251  Joan Touzet
    86  Elana Hashman
    71  Molly de Blanc
    36  Shauna
    19  Juke
    18  Stefano Zacchiroli
    17  Alexey Mirages
    16  Devin Halladay
    14  Nader Jafari
rms_support.git (Last update: 2021-04-12 09:25:53 (UTC))
  1678  shenlebantongying
  1564  nukeop
  1550  Ivanq
   826  Victor
   746  Job Bautista
   123  nekonee
    61  Victor Gridnevsky
    38  Patrick Spek
    25  Borys Kabakov
    17  KIM Taeyeob

(last updated 2021-04-12 09:26:15 (UTC)) Technical info:
Signers are counted from their "Signed / Individuals" sections. Commits are counted with git shortlog -s.
Team Cancel also has organizational signatures with Mozilla, Suse and X.Org being among the notable signatories. Debian is in the process of running a GR to join (or not join) that list. The 16 original signers of the Cancel petition are added in their count. Neil McGovern, Juke and shenlebantongying need .mailmap support as they have committed with different names. Further reading:

An introductory Ars Technica article in case you wonder what this all is about.
Debian vote mailing-list: March 2021, April 2021
NYT Magazine on the history of cancel culture
Ed Santos' commentary and analysis

TL:DR; lost half my mail (150,000 messages, ~6GB) last night. Cause uncertain, but possibly a combination of a dead CMOS battery, systemd OnCalendar=daily, a (locking?) bug in syncmaildir, and generally, a system too exotic and complicated.

The crash So I somehow lost half my mail:

anarcat@angela:~(main)$ du -sh Maildir/
7,9G    Maildir/
anarcat@curie:~(main)$ du -sh Maildir
14G     Maildir
anarcat@marcos:~$ du -sh Maildir
8,0G    Maildir

Those are three different machines:

angela: my laptop, not always on
curie: my workstation, mostly always on
marcos: my mail server, always on

Those mails are synchronized using a rather exotic system based on SSH, syncmaildir and rsendmail. The anomaly started on curie:

-- Reboot --
mar 22 16:13:00 curie systemd[3199]: Starting pull emails with syncmaildir...
mar 22 16:13:00 curie smd-pull[4801]: rm: impossible de supprimer '/home/anarcat/.smd/workarea/Maildir': Le dossier n'est pas vide
mar 22 16:13:00 curie systemd[3199]: smd-pull.service: Main process exited, code=exited, status=1/FAILURE
mar 22 16:13:00 curie systemd[3199]: smd-pull.service: Failed with result 'exit-code'.
mar 22 16:13:00 curie systemd[3199]: Failed to start pull emails with syncmaildir.
mar 22 16:14:00 curie systemd[3199]: Starting pull emails with syncmaildir...
mar 22 16:14:00 curie smd-pull[7025]:  4091 ?        00:00:00 smd-push
mar 22 16:14:00 curie smd-pull[7025]: Already running.
mar 22 16:14:00 curie smd-pull[7025]: If this is not the case, remove /home/anarcat/.smd/lock by hand.
mar 22 16:14:00 curie smd-pull[7025]: any: smd-pushpull@localhost: TAGS: error::context(locking) probable-cause(another-instance-is-running) human-intervention(necessary) suggested-actions(run(kill 4091) run(rm /home/anarcat/.smd/lock))
mar 22 16:14:00 curie systemd[3199]: smd-pull.service: Main process exited, code=exited, status=1/FAILURE
mar 22 16:14:00 curie systemd[3199]: smd-pull.service: Failed with result 'exit-code'.
mar 22 16:14:00 curie systemd[3199]: Failed to start pull emails with syncmaildir.

Then it seems like smd-push (from curie) started destroying the universe for some reason:

mar 22 16:20:00 curie systemd[3199]: Starting pull emails with syncmaildir...
mar 22 16:20:00 curie smd-pull[9319]:  4091 ?        00:00:00 smd-push
mar 22 16:20:00 curie smd-pull[9319]: Already running.
mar 22 16:20:00 curie smd-pull[9319]: If this is not the case, remove /home/anarcat/.smd/lock by hand.
mar 22 16:20:00 curie smd-pull[9319]: any: smd-pushpull@localhost: TAGS: error::context(locking) probable-cause(another-instance-is-running) human-intervention(necessary) suggested-actions(ru
mar 22 16:20:00 curie systemd[3199]: smd-pull.service: Main process exited, code=exited, status=1/FAILURE
mar 22 16:20:00 curie systemd[3199]: smd-pull.service: Failed with result 'exit-code'.
mar 22 16:20:00 curie systemd[3199]: Failed to start pull emails with syncmaildir.
mar 22 16:21:34 curie smd-push[4091]: default: smd-client@smd-server-anarcat: TAGS: stats::new-mails(0), del-mails(293920), bytes-received(0), xdelta-received(26995)
mar 22 16:21:35 curie smd-push[9374]: register: smd-client@smd-server-register: TAGS: stats::new-mails(0), del-mails(0), bytes-received(0), xdelta-received(215)
mar 22 16:21:35 curie systemd[3199]: smd-push.service: Succeeded.

Notice the del-mails(293920) there: it is actively trying to destroy basically every email in my mail spool. Then somehow push and pull started both at once:

mar 22 16:21:35 curie systemd[3199]: Started push emails with syncmaildir.
mar 22 16:21:35 curie systemd[3199]: Starting push emails with syncmaildir...
mar 22 16:22:00 curie systemd[3199]: Starting pull emails with syncmaildir...
mar 22 16:22:00 curie smd-pull[10333]:  9455 ?        00:00:00 smd-push
mar 22 16:22:00 curie smd-pull[10333]: Already running.
mar 22 16:22:00 curie smd-pull[10333]: If this is not the case, remove /home/anarcat/.smd/lock by hand.
mar 22 16:22:00 curie smd-pull[10333]: any: smd-pushpull@localhost: TAGS: error::context(locking) probable-cause(another-instance-is-running) human-intervention(necessary) suggested-actions(r
mar 22 16:22:00 curie systemd[3199]: smd-pull.service: Main process exited, code=exited, status=1/FAILURE
mar 22 16:22:00 curie systemd[3199]: smd-pull.service: Failed with result 'exit-code'.
mar 22 16:22:00 curie systemd[3199]: Failed to start pull emails with syncmaildir.
mar 22 16:22:00 curie smd-push[9455]: smd-client: ERROR: Data transmission failed.
mar 22 16:22:00 curie smd-push[9455]: smd-client: ERROR: This problem is transient, please retry.
mar 22 16:22:00 curie smd-push[9455]: smd-client: ERROR: server sent ABORT or connection died
mar 22 16:22:00 curie smd-push[9455]: smd-server: ERROR: Unable to open Maildir/.kobo/cur/1498563708.M122624P22121.marcos,S=32234,W=32792:2,S: Maildir/.kobo/cur/1498563708.M122624P22121.marco
mar 22 16:22:00 curie smd-push[9455]: smd-server: ERROR: The problem should be transient, please retry.
mar 22 16:22:00 curie smd-push[9455]: smd-server: ERROR: Unable to open requested file.
mar 22 16:22:00 curie smd-push[9455]: default: smd-client@smd-server-anarcat: TAGS: stats::new-mails(0), del-mails(293920), bytes-received(0), xdelta-received(26995)
mar 22 16:22:00 curie smd-push[9455]: default: smd-client@smd-server-anarcat: TAGS: error::context(receive) probable-cause(network) human-intervention(avoidable) suggested-actions(retry)
mar 22 16:22:00 curie smd-push[9455]: default: smd-server@localhost: TAGS: error::context(transmit) probable-cause(simultaneous-mailbox-edit) human-intervention(avoidable) suggested-actions(r
mar 22 16:22:00 curie systemd[3199]: smd-push.service: Main process exited, code=exited, status=1/FAILURE
mar 22 16:22:00 curie systemd[3199]: smd-push.service: Failed with result 'exit-code'.
mar 22 16:22:00 curie systemd[3199]: Failed to start push emails with syncmaildir.

There it seems push tried to destroy the universe again: del-mails(293920). Interestingly, the push started again in parallel with the pull, right that minute:

mar 22 16:22:00 curie systemd[3199]: Starting push emails with syncmaildir...

... but didn't complete for a while, here's pull trying to start again:

mar 22 16:24:00 curie systemd[3199]: Starting pull emails with syncmaildir...
mar 22 16:24:00 curie smd-pull[12051]: 10466 ?        00:00:00 smd-push
mar 22 16:24:00 curie smd-pull[12051]: Already running.
mar 22 16:24:00 curie smd-pull[12051]: If this is not the case, remove /home/anarcat/.smd/lock by hand.
mar 22 16:24:00 curie smd-pull[12051]: any: smd-pushpull@localhost: TAGS: error::context(locking) probable-cause(another-instance-is-running) human-intervention(necessary) suggested-actions(run(kill 10466) run(rm /home/anarcat/.smd/lock))
mar 22 16:24:00 curie systemd[3199]: smd-pull.service: Main process exited, code=exited, status=1/FAILURE
mar 22 16:24:00 curie systemd[3199]: smd-pull.service: Failed with result 'exit-code'.
mar 22 16:24:00 curie systemd[3199]: Failed to start pull emails with syncmaildir.

... and the long push finally resolving:

mar 22 16:24:00 curie smd-push[10466]: smd-client: ERROR: Data transmission failed.
mar 22 16:24:00 curie smd-push[10466]: smd-client: ERROR: This problem is transient, please retry.
mar 22 16:24:00 curie smd-push[10466]: smd-client: ERROR: server sent ABORT or connection died
mar 22 16:24:00 curie smd-push[10466]: smd-client: ERROR: Data transmission failed.
mar 22 16:24:00 curie smd-push[10466]: smd-client: ERROR: This problem is transient, please retry.
mar 22 16:24:00 curie smd-push[10466]: smd-client: ERROR: server sent ABORT or connection died
mar 22 16:24:00 curie smd-push[10466]: smd-server: ERROR: Unable to open Maildir/.kobo/cur/1498563708.M122624P22121.marcos,S=32234,W=32792:2,S: Maildir/.kobo/cur/1498563708.M122624P22121.marcos,S=32234,W=32792:2,S: No such file or directory
mar 22 16:24:00 curie smd-push[10466]: smd-server: ERROR: The problem should be transient, please retry.
mar 22 16:24:00 curie smd-push[10466]: smd-server: ERROR: Unable to open requested file.
mar 22 16:24:00 curie smd-push[10466]: default: smd-client@smd-server-anarcat: TAGS: stats::new-mails(0), del-mails(293920), bytes-received(0), xdelta-received(26995)
mar 22 16:24:00 curie smd-push[10466]: default: smd-client@smd-server-anarcat: TAGS: error::context(receive) probable-cause(network) human-intervention(avoidable) suggested-actions(retry)
mar 22 16:24:00 curie smd-push[10466]: default: smd-server@localhost: TAGS: error::context(transmit) probable-cause(simultaneous-mailbox-edit) human-intervention(avoidable) suggested-actions(retry)
mar 22 16:24:00 curie systemd[3199]: smd-push.service: Main process exited, code=exited, status=1/FAILURE
mar 22 16:24:00 curie systemd[3199]: smd-push.service: Failed with result 'exit-code'.
mar 22 16:24:00 curie systemd[3199]: Failed to start push emails with syncmaildir.
mar 22 16:24:00 curie systemd[3199]: Starting push emails with syncmaildir...

This pattern repeats until 16:35, when that locking issue silently recovered somehow:

mar 22 16:35:03 curie systemd[3199]: Starting pull emails with syncmaildir...
mar 22 16:35:41 curie smd-pull[20788]: default: smd-client@localhost: TAGS: stats::new-mails(5), del-mails(1), bytes-received(21885), xdelta-received(6863398)
mar 22 16:35:42 curie smd-pull[21373]: register: smd-client@localhost: TAGS: stats::new-mails(0), del-mails(0), bytes-received(0), xdelta-received(215)
mar 22 16:35:42 curie systemd[3199]: smd-pull.service: Succeeded.
mar 22 16:35:42 curie systemd[3199]: Started pull emails with syncmaildir.
mar 22 16:36:35 curie systemd[3199]: Starting pull emails with syncmaildir...
mar 22 16:36:36 curie smd-pull[21738]: default: smd-client@localhost: TAGS: stats::new-mails(0), del-mails(0), bytes-received(0), xdelta-received(214)
mar 22 16:36:37 curie smd-pull[21816]: register: smd-client@localhost: TAGS: stats::new-mails(0), del-mails(0), bytes-received(0), xdelta-received(215)
mar 22 16:36:37 curie systemd[3199]: smd-pull.service: Succeeded.
mar 22 16:36:37 curie systemd[3199]: Started pull emails with syncmaildir.

... notice that huge xdelta-received there, that's 7GB right there. Mysteriously, the curie mail spool survived this, possibly because smd-pull started failing again:

mar 22 16:38:00 curie systemd[3199]: Starting pull emails with syncmaildir...
mar 22 16:38:00 curie smd-pull[23556]: 21887 ?        00:00:00 smd-push
mar 22 16:38:00 curie smd-pull[23556]: Already running.
mar 22 16:38:00 curie smd-pull[23556]: If this is not the case, remove /home/anarcat/.smd/lock by hand.
mar 22 16:38:00 curie smd-pull[23556]: any: smd-pushpull@localhost: TAGS: error::context(locking) probable-cause(another-instance-is-running) human-intervention(necessary) suggested-actions(run(kill 21887) run(rm /home/anarcat/.smd/lock))
mar 22 16:38:00 curie systemd[3199]: smd-pull.service: Main process exited, code=exited, status=1/FAILURE
mar 22 16:38:00 curie systemd[3199]: smd-pull.service: Failed with result 'exit-code'.
mar 22 16:38:00 curie systemd[3199]: Failed to start pull emails with syncmaildir.

That could have been when i got on angela to check my mail, and it was busy doing the nasty removal stuff... although the times don't match. Here is when angela came back online:

anarcat@angela:~(main)$ last
anarcat  :0           :0               Mon Mar 22 19:57   still logged in
reboot   system boot  5.10.0-0.bpo.3-a Mon Mar 22 19:57   still running
anarcat  :0           :0               Mon Mar 22 17:43 - 18:47  (01:03)
reboot   system boot  5.10.0-0.bpo.3-a Mon Mar 22 17:39   still running

Then finally the sync on curie started failing with:

mar 22 16:46:35 curie systemd[3199]: Starting pull emails with syncmaildir...
mar 22 16:46:42 curie smd-pull[27455]: smd-server: ERROR: Client aborted, removing /home/anarcat/.smd/curie-anarcat__Maildir.db.txt.new and /home/anarcat/.smd/curie-anarcat__Maildir.db.txt.mtime.new
mar 22 16:46:42 curie smd-pull[27455]: smd-client: ERROR: Failed to copy Maildir/.debian/cur/1613401668.M901837P27073.marcos,S=3740,W=3815:2,S to Maildir/.koumbit/cur/1613401640.M415457P27063.marcos,S=3790,W=3865:2,S
mar 22 16:46:42 curie smd-pull[27455]: smd-client: ERROR: The destination already exists but its content differs.
mar 22 16:46:42 curie smd-pull[27455]: smd-client: ERROR: To fix this problem you have two options:
mar 22 16:46:42 curie smd-pull[27455]: smd-client: ERROR: - rename Maildir/.koumbit/cur/1613401640.M415457P27063.marcos,S=3790,W=3865:2,S by hand so that Maildir/.debian/cur/1613401668.M901837P27073.marcos,S=3740,W=3815:2,S
mar 22 16:46:42 curie smd-pull[27455]: smd-client: ERROR:   can be copied without replacing it.
mar 22 16:46:42 curie smd-pull[27455]: smd-client: ERROR:   Executing  cd; mv -n "Maildir/.koumbit/cur/1613401640.M415457P27063.marcos,S=3790,W=3865:2,S" "Maildir/.koumbit/cur/1616446002.1.localhost"  should work.
mar 22 16:46:42 curie smd-pull[27455]: smd-client: ERROR: - run smd-push so that your changes to Maildir/.koumbit/cur/1613401640.M415457P27063.marcos,S=3790,W=3865:2,S
mar 22 16:46:42 curie smd-pull[27455]: smd-client: ERROR:   are propagated to the other mailbox
mar 22 16:46:42 curie smd-pull[27455]: default: smd-client@localhost: TAGS: error::context(copy-message) probable-cause(concurrent-mailbox-edit) human-intervention(necessary) suggested-actions(run(mv -n "/home/anarcat/.smd/workarea/Maildir/.koumbit/cur/1613401640.M415457P27063.marcos,S=3790,W=3865:2,S" "/home/anarcat/.smd/workarea/Maildir/.koumbit/tmp/1613401640.M415457P27063.marcos,S=3790,W=3865:2,S") run(smd-push default))
mar 22 16:46:42 curie systemd[3199]: smd-pull.service: Main process exited, code=exited, status=1/FAILURE
mar 22 16:46:42 curie systemd[3199]: smd-pull.service: Failed with result 'exit-code'.
mar 22 16:46:42 curie systemd[3199]: Failed to start pull emails with syncmaildir.

It went on like this until I found the problem. This is, presumably, a good thing because those emails were not being destroyed. On angela, things looked like this:

-- Reboot --
mar 22 17:39:29 angela systemd[1677]: Started run notmuch new at least once a day.
mar 22 17:39:29 angela systemd[1677]: Started run smd-pull regularly.
mar 22 17:40:46 angela systemd[1677]: Starting pull emails with syncmaildir...
mar 22 17:43:18 angela smd-pull[3916]: smd-server: ERROR: Unable to open Maildir/.tor/new/1616446842.M285912P26118.marcos,S=8860,W=8996: Maildir/.tor/new/1616446842.M285912P26118.marcos,S=886
0,W=8996: No such file or directory
mar 22 17:43:18 angela smd-pull[3916]: smd-server: ERROR: The problem should be transient, please retry.
mar 22 17:43:18 angela smd-pull[3916]: smd-server: ERROR: Unable to open requested file.
mar 22 17:43:18 angela smd-pull[3916]: smd-client: ERROR: Data transmission failed.
mar 22 17:43:18 angela smd-pull[3916]: smd-client: ERROR: This problem is transient, please retry.
mar 22 17:43:18 angela smd-pull[3916]: smd-client: ERROR: server sent ABORT or connection died
mar 22 17:43:18 angela smd-pull[3916]: default: smd-server@smd-server-anarcat: TAGS: error::context(transmit) probable-cause(simultaneous-mailbox-edit) human-intervention(avoidable) suggested
-actions(retry)
mar 22 17:43:18 angela smd-pull[3916]: default: smd-client@localhost: TAGS: error::context(receive) probable-cause(network) human-intervention(avoidable) suggested-actions(retry)
mar 22 17:43:18 angela systemd[1677]: smd-pull.service: Main process exited, code=exited, status=1/FAILURE
mar 22 17:43:18 angela systemd[1677]: smd-pull.service: Failed with result 'exit-code'.
mar 22 17:43:18 angela systemd[1677]: Failed to start pull emails with syncmaildir.
mar 22 17:43:18 angela systemd[1677]: Starting pull emails with syncmaildir...
mar 22 17:43:29 angela smd-pull[4847]: default: smd-client@localhost: TAGS: stats::new-mails(29), del-mails(0), bytes-received(401519), xdelta-received(38914)
mar 22 17:43:29 angela smd-pull[5600]: register: smd-client@localhost: TAGS: stats::new-mails(2), del-mails(0), bytes-received(92150), xdelta-received(471)
mar 22 17:43:29 angela systemd[1677]: smd-pull.service: Succeeded.
mar 22 17:43:29 angela systemd[1677]: Started pull emails with syncmaildir.
mar 22 17:43:29 angela systemd[1677]: Starting push emails with syncmaildir...
mar 22 17:43:32 angela smd-push[5693]: default: smd-client@smd-server-anarcat: TAGS: stats::new-mails(0), del-mails(0), bytes-received(0), xdelta-received(217)
mar 22 17:43:33 angela smd-push[6575]: register: smd-client@smd-server-register: TAGS: stats::new-mails(0), del-mails(0), bytes-received(0), xdelta-received(219)
mar 22 17:43:33 angela systemd[1677]: smd-push.service: Succeeded.
mar 22 17:43:33 angela systemd[1677]: Started push emails with syncmaildir.

Notice how long it took to get the first error, in that first failure: it failed after 3 minutes! Presumably that's when it started deleting all that mail. And this is during pull, not push, so the error didn't come from angela.

Affected data It seems 2GB of mail from my main INBOX was destroyed. Another 2.4GB of spam (kept for training purposes) was also destroyed, along with 700MB of Sent mail. The rest is hard to figure out, because the folders are actually still there, just smaller. So I relied on ncdu to figure out the size changes. (Note that I don't really archive (or delete much of) my mail since I use notmuch, which is why the INBOX is so large...) Concretely, according to the notmuch-new.service which still runs periodically on marcos, here are the changes that happened on the server:

mar 22 16:17:12 marcos notmuch[10729]: Added 7 new messages to the database. Removed 57985 messages. Detected 1372 file renames.
mar 22 16:22:43 marcos notmuch[12826]: No new mail. Removed 143842 messages. Detected 6072 file renames.
mar 22 16:27:02 marcos notmuch[13969]: No new mail. Removed 82071 messages. Detected 1783 file renames.
mar 22 16:29:45 marcos notmuch[15079]: Added 22743 new messages to the database. Detected 1 file rename.
mar 22 16:31:48 marcos notmuch[16196]: Added 22779 new messages to the database. Removed 5 messages.
mar 22 16:33:11 marcos notmuch[17192]: Added 3711 new messages to the database.
mar 22 16:40:41 marcos notmuch[19122]: Added 74558 new messages to the database. Detected 1 file rename.
mar 22 16:43:21 marcos notmuch[20325]: Added 9061 new messages to the database. Detected 4 file renames.
mar 22 17:43:08 marcos notmuch[7420]: Added 1793 new messages to the database. Detected 6 file renames.

That is basically the entire mail spool destroyed at first (283 898 messages), and then bits and pieces of it progressively re-added (134 645 messages), somehow, so 149 253 mails were lost, presumably.

Recovery I disabled the services all over the place:

systemctl --user --now disable smd-pull.service smd-pull.timer smd-push.service smd-push.timer notmuch-new.service notmuch-new.timer

(Well, technically, I did that only on angela, as I thought the problem was there. Luckily, curie kept going but it seems like it was harmless.) I made a backup of the mail spool on curie:

tar cf - Maildir/   pv -s 14G   gzip -c > Maildir.tgz

Then I crossed my fingers and ran smd-push -v -s, as that was suggested by smd error codes themselves. That thankfully started restoring mail. It failed a few times on weird cases of files being duplicates, but I resolved this by following the instructions. Or mostly: I actually deleted the files instead of moving them, which made smd even unhappier (if there ever was such a thing). I had to recreate some of those files, so, lesson learned: do follow the advice smd gives you, even if it seems useless or strange. But then smd-push was humming along, uploading tens of thousands of messages, saturating the upload in the office, refilling the mail spool on the server... yaay!... ? Except... well, of course that didn't quite work: the mail spool in the office eventually started to grow beyond the size of the mail spool on the workstation. That is what smd-push eventually settled on:

default: smd-client@smd-server-anarcat: TAGS: error::context(receive) probable-cause(network) human-intervention(avoidable) suggested-actions(retry)
default: smd-client@smd-server-anarcat: TAGS: error::context(receive) probable-cause(network) human-intervention(avoidable) suggested-actions(retry)
default: smd-client@smd-server-anarcat: TAGS: stats::new-mails(151697), del-mails(0), bytes-received(7539147811), xdelta-received(10881198)

It recreated 151 697 emails, adding about 2000 emails to the pool, kind of from nowhere at all. On marcos, before:

ncdu 1.13 ~ Use the arrow keys to navigate, press ? for help
--- /home/anarcat/Maildir ------------------------------------
    4,0 GiB [##########] /.notmuch
  717,3 MiB [#         ] /.Archives.2014
  498,2 MiB [#         ] /.feeds.debian-planet
  453,1 MiB [#         ] /.Archives.2012
  414,5 MiB [#         ] /.debian
  408,2 MiB [#         ] /.quoifaire
  389,8 MiB [          ] /.rapports
  356,6 MiB [          ] /.tor
  182,6 MiB [          ] /.koumbit
  179,8 MiB [          ] /tmp
   56,8 MiB [          ] /.nn
   43,0 MiB [          ] /.act-mtl
   32,6 MiB [          ] /.feeds.sysadvent
   31,7 MiB [          ] /.feeds.releases
   31,4 MiB [          ] /.Sent.2005
   26,3 MiB [          ] /.sage
   25,5 MiB [          ] /.freedombox
   24,0 MiB [          ] /.feeds.git-annex
   21,1 MiB [          ] /.Archives.2011
   19,1 MiB [          ] /.Sent.2003
   16,7 MiB [          ] /.bugtraq
   16,2 MiB [          ] /.mlug
 Total disk usage:   8,0 GiB  Apparent size:   7,6 GiB  Items: 184426

After:

ncdu 1.13 ~ Use the arrow keys to navigate, press ? for help
--- /home/anarcat/Maildir ------------------------------------
    4,7 GiB [##########] /.notmuch
    2,7 GiB [#####     ] /.junk
    1,9 GiB [###       ] /cur
  717,3 MiB [#         ] /.Archives.2014
  659,3 MiB [#         ] /.Sent
  513,9 MiB [#         ] /.Archives.2012
  498,2 MiB [#         ] /.feeds.debian-planet
  449,6 MiB [          ] /.Archives.2015
  414,5 MiB [          ] /.debian
  408,2 MiB [          ] /.quoifaire
  389,8 MiB [          ] /.rapports
  380,8 MiB [          ] /.Archives.2013
  356,6 MiB [          ] /.tor
  261,1 MiB [          ] /.Archives.2011
  240,9 MiB [          ] /.koumbit
  183,6 MiB [          ] /.Archives.2010
  179,8 MiB [          ] /tmp
  128,4 MiB [          ] /.lists
  106,1 MiB [          ] /.inso-interne
  103,0 MiB [          ] /.github
   75,0 MiB [          ] /.nanog
   69,8 MiB [          ] /.full-disclosure
 Total disk usage:  16,2 GiB  Apparent size:  15,5 GiB  Items: 341143

That is 156 717 files more. On curie:

ncdu 1.13 ~ Use the arrow keys to navigate, press ? for help
--- /home/anarcat/Maildir ------------------------------------------------------------------
    2,7 GiB [##########] /.junk
    2,3 GiB [########  ] /.notmuch
    1,9 GiB [######    ] /cur
  661,2 MiB [##        ] /.Archives.2014
  655,3 MiB [##        ] /.Sent
  512,0 MiB [#         ] /.Archives.2012
  447,3 MiB [#         ] /.Archives.2015
  438,5 MiB [#         ] /.feeds.debian-planet
  406,5 MiB [#         ] /.quoifaire
  383,6 MiB [#         ] /.debian
  378,6 MiB [#         ] /.Archives.2013
  303,3 MiB [#         ] /.tor
  296,0 MiB [#         ] /.rapports
  237,6 MiB [          ] /.koumbit
  233,2 MiB [          ] /.Archives.2011
  182,1 MiB [          ] /.Archives.2010
  127,0 MiB [          ] /.lists
  104,8 MiB [          ] /.inso-interne
  102,7 MiB [          ] /.register
   89,6 MiB [          ] /.github
   67,1 MiB [          ] /.full-disclosure
   66,5 MiB [          ] /.nanog
 Total disk usage:  13,3 GiB  Apparent size:  12,6 GiB  Items: 342465

Interestingly, there are more files, but less disk usage. It's possible the notmuch database there is more efficient. So maybe there's nothing to worry about. Last night's marcos backup has:

root@marcos:/home/anarcat# find /mnt/home/anarcat/Maildir   pv -l   wc -l
 341k 0:00:16 [20,4k/s] [                             <=>                                                                                                                                     ]
341040

... 341040 files, which seems about right, considering some mail was delivered during the day. An audit can be performed with hashdeep:

borg mount /media/sdb2/borg/::marcos-auto-2021-03-22 /mnt
hashdeep -c sha256 -r /mnt/home/anarcat/Maildir   pv -l -s 341k > Maildir-backup-manifest.txt

And then compared with:

hashdeep -c sha256 -k Maildir-backup-manifest.txt Maildir/

Some extra files should show up in the Maildir, and very few should actually be missing, because I shouldn't have deleted mail from the previous day the next day, or at least very few. The actual summary hashdeep gave me was:

hashdeep: Audit failed
   Input files examined: 0
  Known files expecting: 0
          Files matched: 339080
Files partially matched: 0
            Files moved: 782
        New files found: 107
  Known files not found: 106

So 106 files added, 107 deleted. Seems good enough for me... Postfix was stopped at Mar 22 21:12:59 to try and stop external events from confusing things even further. I reviewed the delivery log to see if mail that came in during the problem window disappeared:

grep 'dovecot:.*stored mail into mailbox' /var/log/mail.log  
  tail -20  
  sed 's/.*msgid=<//;s/>.*//'   
  while read msgid; do 
    notmuch count --exclude=false id:$msgid  
      grep 0 && echo $msgid missing;
  done

And things looked okay. Now of course if we go further back, we find mail I actually deleted (because I do do that sometimes), so it's hard to use this log as an audit trail. We can only hope that the curie spool is sufficiently coherent to be relied on. Worst case, we'll have to restore from last night's backup, but that's getting far away now: I get hundreds of mails a day in that mail spool, and reseting back to last night does not seem like a good idea. A dry run of smd-pull on angela seems to agree that it's missing some files:

default: smd-client@localhost: TAGS: stats::new-mails(154914), del-mails(0), bytes-received(0), xdelta-received(0)

... a number of mails somewhere in between the other two, go figure. A "wet" run of this was started, without deletion (-n), which gave us:

default: smd-client@localhost: TAGS: stats::new-mails(154911), del-mails(0), bytes-received(7658160107), xdelta-received(10837609)

Strange that it sync'd three less emails, but that's still better than nothing, and we have a mail spool on angela again:

anarcat@angela:~(main)$ notmuch new
purging with prefix '.': spam moved (0), ham moved (0), deleted (0), done
Note: Ignoring non-mail file: /home/anarcat/Maildir//.uidvalidity
Processed 1779 total files in 26s (66 files/sec.).
Added 1190 new messages to the database. Removed 3 messages. Detected 593 file renames.
tagging with prefix '.': spam, sent, feeds, koumbit, tor, lists, rapports, folders, done.

Notice how only 1190 messages were re-added, that is because I killed notmuch before it had time to remove all those mails from its database.

Possible causes I am totally at a loss as to why `smd` started destroying everything like it did. But a few things come to mind:

I rewired my office on that day.

This meant unplugging `curie`, the workstation.

It has a bad CMOS battery (known problem), so it jumped around the time continuum a few times, sometimes by years.

The `smd` services are ran from a systemd unit with `OnCalendar=*:0/2`. I have heard that it's possible that major time jumps "pile up" execution of jobs, and it seems this happened in this case.

It's possible that locking in `smd` is not as great as it could be, and that it corrupted its internal data structures on `curie`, which led it to command a destruction of the remote mail spool.

It's also possible that there was a disk failure on the server, `marcos`. But since it's running on a (software) RAID-1 array, and no errors have been found (according to `dmesg`), I don't think that's a plausible hypothesis.

Lessons learned

follow what `smd` says, even if it seems useless or strange.

trust but verify: just backup everything before you do anything, especially the largest data set.

daily backups are not great for email, unless you're ready to lose a day of email (which I'm not).

hashdeep is great. I keep finding new use cases for it. Last time it was to audit my camera SD card to make sure I didn't forget anything, and now this. it's fast and powerful.

borg is great too. the FUSE mount was especially useful, and it was pretty fast to explore the backup, even through that overhead: checksumming 15GB of mail took about 35 minutes, which gives a respectable 8MB/s, probably bottlenecked by the crap external USB drive I use for backups (!).

I really need to finish my backup system so that I have automated offsite backups, although in this case that would actually have been much slower (certainly not 8MB/s!).

Workarounds and solutions I setup fake-hwclock on curie, so that the next power failure will not upset my clock that badly. I am thinking of switching to ZFS or BTRFS for most of my filesystems, so that I can use filesystem snapshots (including remotely!) as a backup strategy. This seems so much more powerful than crawling the filesystem for changes, and allows for truly offsite backups protected from an attacker (hopefully). But it's a long way there. I'm also thinking of rebuilding my mail setup without `smd`. It's not the first time something like this happens with `smd`. It's the first time I am more confident it's the root cause of the problem, however, and it makes me really nervous for the future. I have used offlineimap in the past and it seems it was finally ported to Python 3 so that could be an option again. isync/mbsync is another option, which I tried before but do not remember why I didn't switch. A complete redesign with something like getmail and/or nncp could also be an option. But alas, I lack the time to go crazy with those experiments. Somehow, doing like everyone else and just going with Google still doesn't seem to be an option for me. Screw big tech. But I am afraid they will win, eventually. In any case, I'm just happy I got mail again, strangely.

Search Results: "tats"

20 December 2021

Changes in RcppSMC version 0.2.6 (2021-12-17) Updated URLs to JSS for the new DOI scheme upon their request Adjusted three source files for C++17 compilation under g++-11

21 November 2021

9 September 2021

Changes in RcppSMC version 0.2.5 (2021-09-09) Compilation under Solaris is aided via std::pow use (Dirk in #65 fixing #64)

3 September 2021

25 August 2021

10 August 2021

4 August 2021

19 July 2021

10 July 2021

29 June 2021

Workaround I have filed this as a bug in upstream issue 18. Considering I filed 11 issues and only 3 of those were closed, I'm not holding my breath. I nevertheless filed PR 19 in the hope that this will fix my particular issue, but I'm not even sure this is the right fix...

14 June 2021

Creating a separate user account Since Ikiwiki needs to regenerate my blog whenever a new article is pushed to the git repo or a comment is accepted, I created a restricted user account for it: adduser blog adduser blog sshuser chsh -s /usr/bin/git-shell blog

13 June 2021

6 June 2021

30 May 2021

10 May 2021

3 May 2021

30 March 2021

26 March 2021

23 March 2021

Changes in RcppSMC version 0.2.6 (2021-12-17)

Updated URLs to JSS for the new DOI scheme upon their request

Adjusted three source files for C++17 compilation under g++-11

Changes in RcppSMC version 0.2.5 (2021-09-09)

Compilation under Solaris is aided via `std::pow` use (Dirk in #65 fixing #64)

Creating a separate user account Since Ikiwiki needs to regenerate my blog whenever a new article is pushed to the git repo or a comment is accepted, I created a restricted user account for it:
`adduser blog adduser blog sshuser chsh -s /usr/bin/git-shell blog`